Software development dashboard

GROMACS development priorities based on user feedback

Constant pH simulations

Choosing the protonation state for (de)protonatable groups on proteins is problematic, Furthermore, most biomolecular systems show important changes in response to pH, and the local pH for a molecule can change when it moves 7through the system. Changes in protonation states can have rather large effects due to a local change in electrostatic potential. Dynamic protonation will enable simulating many interesting biological processes and also makes the initial choice of protonation states less critical or superfluous. We have recently developed a very efficient algorithm for simulating dynamics protonation (doi:10.1021/acs.jctc.2c00516), which is available in a branch of GROMACS. It needs to be refactored and fully integrated before it can be part of an official GROMACS release.

The development of the feature is described at the GROMACS issue tracker on gitlab (https://gitlab.com/gromacs/gromacs/-/issues/4273).

New force field potentials - machine learning potentials

With the breakthrough of AI and machine learning potentials, using these techniques have quickly gained popularity. Such potentials can be based on (expensive) ab-initio calculations, as well as on large ensembles of structures. These potentials are currently not supported by GROMACS’ standard pair potential setup. We will add an interface to (Py)Torch. This provides full flexibility and support for most potentials, although at a high computational cost. We will also add support for mixed machine-learning – molecular mechanics models, where only a part of the system, e.g. a ligand, is treated with the more accurate potential. This can provide a significant gain in accuracy with less of a performance penalty.

The development of the feature is described at the GROMACS issue tracker on gitlab (https://gitlab.com/gromacs/gromacs/-/issues/5039).

More force fields in the default distributions

Force fields are continuously evolving and are continuously updated to account for the variety of the complex biomolecular systems. The most recent versions of the force fields are usually recommended by the force field developers. With newer versions of force fields included in the default GROMACS distribution it would be easier for users to enjoy the recent developments. However, this comes with the burden of converting the force fields to a format suitable for GROMACS and subsequently testing the conversions in order to ensure correctness. In addition, it would also be important to retain old versions of force fields for backward compatibility. A balance between usability and sustainability will have to be found, in dialogue with the force field developers and/or contributors involved in the conversion process.

The development of the feature is described at the GROMACS issue tracker on gitlab (https://gitlab.com/gromacs/gromacs/-/issues/4998).

Molecular Dynamics for large systems

Simulations of large systems pose particular challenges for simulation software. Large systems occur often in flow applications but are also getting more common in the biomolecular field with simulation of virus capsids and small cells. Large systems cover larger coordinates in space, which means less absolute precision is available. The main effect that deteriorates the quality of the results is larger noise, due to rounding errors in the integration. One remedy is compiling GROMACS in double precision (for small systems single precision has proved sufficient), but that affects performance significantly and excludes the use of general-purpose GPUs (even GPUs that support double precision do so with a significant performance hit). A more elegant solution is storing the coordinates in fixed precision using 32-bit integers. This results in a factor 100 lower rounding errors at little extra cost. Due to the fixed precision, the rounding errors are independent of the value of the coordinates.

HADDOCK development priorities based on user feedback

HADDOCK3: Coarse-graining (with more flexibility in the definition of particles)

In HADDOCK2.X we have introduced coarse-graining based on the Martini v2.2 force field. This feature is currently not available in HADDOCK3 and is being reimplemented. We also plan to provide support for the latest Martini force field.

Follow the developments in our GitHub repository (CG-HADDOCK branch)

HADDOCK3: Improved support for carbohydrate

From the 2024 BioExcel survey, HADDOCK users indicated a need for improved support for protein-glycan modelling. We will extend the support for various carbohydrates and develop protocols for efficient protein-glycans docking.

Extended support for glycans was added in the following pull request.

A protocol for protein-glycan docking was recently published: DOI: 10.1021/acs.jcim.4c01372

HADDOCK3: Improved support for anti/nanobody modelling, including AI tools

From the 2024 BioExcel survey, HADDOCK users indicated a need for improved protocols for modelling anti/nanobody complexes, including the use of AI tools. This

A protocol for antibody-antigen modelling, using AI-generated models as input for the docking was recently published: DOI:10.1093/bioinformatics/btae583 . Tutorials on antibody-antigen modelling are available from the HADDOCK3 tutorial page. Work on nanobodies is ongoing.

PMX development priorities based on user feedback

Mutation libraries: New libraries (Force Fields)

PMX offers a solution for creating hybrid structures and topologies for amino and nucleic acid mutations. This is made possible through pre-existing atom mapping schemes between the residues. These mappings form mutation libraries that depend on the force field being used. As new force fields are constantly being developed, it is crucial to keep the mutation libraries up to date. One of the main objectives of PMX’s development is to automate the process of generating mutation libraries and promptly update them whenever new force field versions are released.

PMX already supports some of the most widely used force fields from the AMBER and CHARMM families, such as AMBER14SB and CHARMM36m. The users’ requests for support for additional force fields suggest an increasing interest in the newer versions of the force fields or additional families, such as OPLS.

Features: Automatic detection and solution of convergence issue

The estimation of free energy differences suffers from convergence problems (poor overlap of forward and reverse work distributions in the framework of the Crooks fluctuation theorem) as the perturbation size (change in the number of heavy atoms) increases. The issue has been observed across various systems such as protein mutations, ligand mutations, absolute ligand binding free energy calculations, etc. In some cases, even a minimal perturbation (say, morphing an H to Cl) can also lead to problems with convergence. The accuracy of the estimated free energy differences directly deteriorates due to poor work distribution overlap. Hence, it is of utmost importance to detect and solve the issue with convergence for a more accurate estimation of free energy.

In the future, as a part of PMX’s long-term development, we plan to automate the detection of the poor phase space overlap which contributes to the convergence issue and further provide measures for tackling the same.

Features: AI integration & chemical space navigation

One of the major interests of the user is to integrate AI features into PMX and develop protocols for guided chemical space navigation. The developers are starting to work on integrating various AI tools within PMX and planning protocols for navigating chemical space guided by AI and physics-based approaches.

Documentation

We plan to focus on extensive documentation of the core background, algorithms and implementation details involved in the non-equilibrium alchemical calculations employing PMX. This includes explanations of basic theories like Jazynski’s equality, and the Crooks fluctuation theorem, as well as atom mapping, hybrid structure generation, and double-system/single-box (DSSP) for charge-changing mutations on the technical side. The updated documentation will be made available across various platforms, such as the PMX webpage, GitHub, and the BioExcel webpage to attract and guide a larger user base.

Tutorials

Currently, there are tutorials available on protein mutations and relative binding free energy calculations on the PMX webpage and on the GitHub page. However, some of these tutorials are outdated and involve the use of PMX as a command in CLI. We are aiming to create a systematic set of tutorials, all involving the latest jupyter notebook API and every tutorial will be provided as a single notebook file for better usability. The extensive set of tutorials will include the calculation of solvation free energy of small molecules in water and other organic solvents, estimation of partition coefficient using non-equilibrium alchemical approach, protein/ligand/nucleic acid mutations, absolute protein-ligand binding free energy, pKa estimation, covering all of the PMX base functionality. Overall, the documentation and tutorials will cover the entire range of PMX capabilities in a simple and easy-to-use manner and will be updated regularly.

BioBB development priorities based on user feedback

Application Building Blocks: Implement new and/or update already existing building blocks for GROMACS, HADDOCK and PMX

GROMACS: In biobb_gromacs we are working on including new features related to enhanced sampling simulations, taking advantage of the new integration of PLUMED in the GROMACS v2025.
Follow the developments in the corresponding GitHub repository (biobb_gromacs)
HADDOCK: In biobb_haddock we are expanding the set of building blocks to cover a higher part of the HADDOCK v3 features, now that the basic ones are implemented (see Implemented features section).
Follow the developments in the corresponding GitHub repository (biobb_haddock)
PMX: In biobb_pmx we are maintaining and updating the module to keep the pace with the new PMX implementations such as covalent or post-translational modifications.
Follow the developments in the corresponding GitHub repository (biobb_pmx)

Application Building Blocks: Implement new building blocks from user’s feedback

User-driven features and functionalities have been collected from users’ feedback and the 2024 BioExcel survey, and a set of new potential additions to the library has been defined, including new MD tools (OpenMM), enhanced sampling techniques (PLUMED, GROMACS AWH), structural conformation prediction and modeling (Rosetta, AlphaFold2), and QM methods (CP2K, Gaussian).
Follow the developments in our GitHub repositories (biobb)

Application Building Blocks: Implement new building blocks from collaboration projects

New building blocks are being developed in collaboration with the community: DNA-specific building blocks to implement a set of best practices in MD preparation, run, analysis and storage of nucleic acid simulations with the Ascona B-DNA consortium (hexABC project). Follow this development in the corresponding GitHub repository (biobb_dna)

New functionalities for the Virtual Screening module including popular tools in the field such as the AutoDock Vina fork Smina, the deep-learning docking tool Gnina, or the Gypsum-DL small molecule 3D predictor, in collaboration with the school of Pharmacy of the University of Eastern Finland. Follow this development in the corresponding GitHub repository (biobb_vs)

Demonstration Workflows: Implement new demonstration workflows as Jupyter Notebooks

New demonstration workflows to be deployed as Jupyter Notebooks or Google Colabs are under development, including a HADDOCK3 protein-protein docking, an AI–based feature extraction and pattern recognition from MD simulations using autoencoders, and a workflow to analyse and extract dynamic and flexibility properties from protein-membrane MD trajectories.
Follow these development in our GitHub workflows’ repositories (BioBB demonstration workflows)

Pre-exascale workflows: Deploy HPC workflow(s) prototypes in EuroHPC supercomputers

BioBB pre-exascale workflows to run massive calculations are being designed for a set of showcases in the BioExcel Centre of Excellence. Projects including coevolution-driven metadynamics simulations from ML-derived transition coordinates or AI-based insights on the functional annotation of sequence variants from MD simulations are being developed and will be tested in the CSC LUMI and BSC Marenostrum 5 supercomputers using the BSC PyCOMPSs workflow manager.
Follow these development in our GitHub HPC workflows’ repositories (BioBB HPC workflows)