GROMACS (http://www.gromacs.org) is one of the major software packages for the simulation of biological macromolecules. It is aimed at performing the simulation of large, biologically relevant systems, with a focus on both being efficient and flexible to allow the research of a number of different systems (see the examples provided further down the page). The program has been used by research groups all around the globe, with several hundred publications being based directly or indirectly on it published during the last few years (see the Figure below for the results of a search for the program use in Scopus).
GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
GROMACS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details. The full license text can be found here.
Involvement in Bioexcel
GROMACS is part of the Bioexcel drive to provide the framework for simulating any bio-molecular system, through the development of tools that are both efficient in simulating biologically relevant systems, while also giving the flexibility to be applied to different use cases through the ability to rapidly extend the functionality. GROMACS ability to simulate arbitrarily large systems makes it possible to combine the knowledge provided by the other programs involved in the program to get a better understanding of the systems being studied.
Further developments as part of Bioexcel
The aim is to extend the existing capabilities of GROMACS through providing an API framework that other programs can use to interface with the main simulation engine. This will make it possible to further extend the possible usage to more biologically and chemically relevant systems, as other programs will be able to directly interface with the efficient core routines used for the simulations.
In addition to this, both the KTH and MPG partners will contribute to improving the performance, scalability, quality and usability both for GROMACS and other simulation codes:
- QA, unit testing and a general library for biomolecular modelling.
GROMACS will be turned into a state-of-the-art module-based C++ library with full unit testing and up-to-date user & developer documentation for all modules. The project is moving to a professional QA setup by introducing strict code review (including from the main developers) and automatic continuous integration where all patches are compiled and unit-tested on a wide range of hardware and compilers to QA-approve every single change, and to make it possible for any installation site to guarantee the quality of their compiled install.
- Heterogeneous parallelization.
We will develop a new heterogeneous parallelization implementation where all available CPU, accelerator and communication resources are used in parallel on each node through explicit multithreading and multi-level load balancing, as well as new support for OpenCL and Xeon Phi accelerators in addition to CUDA.
- Efficient ensemble techniques.
Some of the most powerful approaches today are based on using hundreds or thousands of simulations for ensemble sampling techniques such as Markov state models or free energy calculations. We will make these approaches accessible to users in general by fully integrating our Copernicus framework for ensemble simulation with GROMACS (Pronk et al. 2011). This will make it possible to formulate high-level sampling and free energy calculation problems as black-box computation problems that can employ hundreds of thousands of processors internally. This is particularly important for high-throughput free energy screening applications. Notably, the framework is not limited to GROMACS, but it can be used with any code.
- To facilitate exchange of data with other applications, and to enable fully automated high-throughput simulation, we are developing public data formats to describe molecules with XML, highly compressed trajectory formats that support digital hashes and signatures to guarantee data integrity, and new tools to automatically create interaction descriptions (topologies) for arbitrary small molecules used e.g. as drug compounds targeting a number of different force fields such as CHARMM, GAFF, or OPLS-AA (Lundborg & Lindahl 2014).
- Some of the most promising potential applications of free energy calculation include the prediction of amino acid scanning experiments or how small molecules should be altered to improve binding. Currently, this is hampered by the requirement of either calculating absolute free energies for large changes (which causes large statistical errors), or manually designing topologies where residues or drugs are morphed directly into related molecules. As part of BioExcel, we will make free energy calculations applicable in these high-throughput settings by developing and integrating new modules to automatically morph any amino acid into others, and automatically turn drug compounds into related derivatives while keeping the perturbation as small as possible. In combination with automatic topology generation and ensemble simulation this will turn molecular simulations into a tool that can screen molecular and binding stability in 24-48h, with large implications for drug design usage in the pharmaceutical industry.
To ensure both the correctness of the program code and thus the simulation results, and to drive the development through both internal and external contibutions, we employ best practises associated with modern software development:
- Source code version and revision control is provided by using git as the version control system, with the GROMACS repository available for checking out the source code using
1git clone git://git.gromacs.org/gromacs.git
or through browsing at https://github.com/gromacs/gromacs. git allows for efficient version control at the same time as well as enabling contributions and development from developers all around the globe.
- The GROMACS code is manually and automatically reviewed using Gerrit, ensuring that new contributions and changes to the existing code base pass the inspection of several core developers before being included in the main distribution. The GROMCAS Gerrit can be reached at https://gerrit.gromacs.org/.
- Continuous integration testing is performed on each change before and after inclusion into the main branch through the use of Jenkins. This ensures that each change not only passes a number of tests on the portability before being included, but also that no change will introduce errors through extensive testing of the existing functionality. The GROMACS Jenkins server is found at http://jenkins.gromacs.org.
- New and existing parts of the program are extensively tested using our unit and regression testing infrastructure integrated into Gerrit and Jenkins.
- Documentation standards are enforced through the use of Doxygen (doxygen.org) as the standard tool for documenting functions and including them in the program reference manual.
- We provide our own issue tracking system at redmine.gromacs.org to allow us to identify program errors and work together with users and developers on fixing them.
Training and support activities
We provide several ways of both users and potential developers of GROMACS to get in contact with us to ask their questions concerning applications and modifications.
- General questions concerning GROMACS usage for simulating systems can and should be asked to the GROMACS user mailing list
- Questions concerning GROMACS development and the implementation of new functionality should be ask on the developers mailing list. This is also the place to get more information about new and coming features for future versions.
- Other questions concerning the involvement of GROMACS in Bioexcel can also be asked at our user forum at ask.bioexcel.eu.
Simulation of biological macromolecules has evolved from a niche statistical-mechanics method into one of the most widely applied biophysical research tools, and is used far outside theoretical chemistry. Supercomputers are now as important as centrifuges or test tubes in chemistry. As showcased by the 2013 Nobel prize for chemistry, molecular dynamics based on statistical mechanics makes it possible to simulate the motions of atomic in realistic environments at room temperature, for systems ranging from material chemistry to proteins, DNA, RNA and membranes containing millions of atoms. The fundamental algorithm of molecular dynamics evaluates forces on all atoms in a system and updates the velocities and positions of the atoms according to Newton’s equations of motion. This numerical integration scheme is iterated for billions of steps, and it generates a series of samples that describe the thermodynamic ensemble of the system. This is the true strength of the technique, since it predicts experiments: it can accurately describe how molecules such as proteins move, but it also enables the calculation of free energies that describe chemical reactions, for instance the binding free energy of a candidate drug compound in a protein active site or how a ligand will stabilize a particular conformation to open or close an ion channel. Since the calculation of forces is required for a large number of algorithms, several other packages use molecular simulation toolkits as libraries to evaluate energies, for instance in docking or when refining structures with experimental restraints such as X-ray, NMR, or Cryo-EM data.
This development would not have been possible without significant research efforts in simulation algorithms, optimization, and parallelization. The emergence of standardized packages for molecular modelling such as GROMACS, NAMD, AMBER, and CHARMM has been critical since they have helped commoditize simulation research, making the techniques available to life science application researchers who are not specialists in simulation development. All these packages have complementary strengths and profiles – the field has moved from historical competition to extensive sharing of ideas. GROMACS is one of the most widely used scientific software packages in the world with about 20,000 citations in total (Hess et al. 2008, Pronk et al. 2013); it is the largest free software and open source application in biomolecular research, and the only one of the major molecular dynamics simulation packages where development is led in Europe.
The GROMACS project started in 1995 as one of the first-ever parallel simulation codes, the international development team is lead by the KTH partner, and the project is strongly focused on simulation efficiency and generality. It is the only package to support all common force fields and it has a very wide range of simulation algorithms. This combined with the very liberal (and business-friendly) licensing is likely a major factor why it is used as a simulation, minimization and energy evaluation library by several other applications e.g. in bioinformatics or distributed computing projects such as Folding@Home. The code is portable to a very wide range of platforms (including embedded ones), it includes manually tuned assembly kernels for a dozen different architecture instruction sets and accelerator support both for Nvidia GPUs with CUDA, AMD GPUs with OpenCL, and Xeon Phi processors natively. The package uses state-of-the-art neutral territory domain decomposition and multi-level parallelization to enable scaling both to tens of thousands of nodes on supercomputers and efficient high-throughput computing with accelerators (Pall et al. 2014).
GROMACS can already use thousands of cores and hundreds of accelerators efficiently in parallel, even for a single quite small system. When adding ensemble-level parallelization with Copernicus the total problem scaling extends another two orders of magnitude.
Molecular dynamics simulation in general, and GROMACS in particular, has made it possible to study large and complex biomolecular systems such as membranes and membrane proteins and probe atomic detail that is not accessible to any experimental methods. Molecular simulations provided some of the first high-resolution models of resting states of ion channels based on X-ray structures of open channels (Vargas et al. 2012), and they were critical to model transient intermediate conformations during structural transitions of membrane proteins (Henrion et al. 2012). GROMACS was also used to predict the first specific molecular recognition of lipids by membrane proteins (Contreras et al. 2012) and for the simulations that identified separate potentiating and inhibitory binding sites in the ligand-gated ion channels of our nervous system (Murail et al. 2012) – results that are now used by several groups in attempts to design better drugs.
Recent improvements to the core codebase have allowed an improved scaling through the incorporation of new algorithmic concepts that allow for a more efficient usage of computational resources.