Choosing the protonation state for (de)protonatable groups on proteins is problematic, Furthermore, most biomolecular systems show important changes in response to pH, and the local pH for a molecule can change when it moves 7through the system. Changes in protonation states can have rather large effects due to a local change in electrostatic potential. Dynamic protonation will enable simulating many interesting biological processes and also makes the initial choice of protonation states less critical or superfluous. We have recently developed a very efficient algorithm for simulating dynamics protonation (doi:10.1021/acs.jctc.2c00516), which is available in a branch of GROMACS. It needs to be refactored and fully integrated before it can be part of an official GROMACS release. 

The development of the feature is described at the GROMACS issue tracker on gitlab (https://gitlab.com/gromacs/gromacs/-/issues/4273).

Ensemble algorithms provide another level of parallelism, which is becoming critical to enable efficient usage of large HPC resources even for small systems such as proteins. While this is already supported in many ways in the main code base, the data analysis can often only be done using external tools, as in the case of constructing Markov state models (MSMs). Here, users would benefit from being able to use the GROMACS tools directly to perform some of the tasks needed to construct those MSMs, such as efficient clustering and feature extraction, as well as automatically setting up simulations through the Python API. It is planned to invest development effort into streamlining those processes and integrating them into a new GROMACS tool that will take advantage of the improvements to the API and the analysis tool suite.

With the breakthrough of AI and machine learning potentials, using these techniques have quickly gained popularity. Such potentials can be based on (expensive) ab-initio calculations, as well as on large ensembles of structures. These potentials are currently not supported by GROMACS’ standard pair potential setup. There are two ways of leveraging the capabilities of GROMACS with machine learning potentials. One is adding an interface to (Py)Torch. This provides full flexibility and support for most potentials. But compared to the speed of GROMACS, (Py)Torch is orders of magnitude slower. This will limit the time scales that can be covered. A second approach is native GPU support for shallow neural networks. This has more limited flexibility, but will likely be orders of magnitude faster. We plan to pursue both approaches.

Force fields are continuously evolving and are continuously updated to account for the variety of the complex biomolecular systems. The most recent versions of the force fields are usually recommended by the force field developers. With newer versions of force fields included in the default GROMACS distribution it would be easier for users to enjoy the recent developments. However, this comes with the burden of converting the force fields to a format suitable for GROMACS and subsequently testing the conversions in order to ensure correctness. In addition, it would also be important to retain old versions of force fields for backward compatibility. A balance between usability and sustainability will have to be found, in dialogue with the force field developers and/or contributors involved in the conversion process.

An option to automatically pre-process input structures could be useful, e.g. to correct for ion namings, possible residue numbering overlap, residue insertions (e.g. in antibodies), or filtering of disordered regions for Alphafold models.

In the current HADDOCK2 web server, prior to submitting a run users can download all pre-processed parameters and input data. Having an option to launch a local version of HADDOCK directly from those processed parameters would be nice.

In HADDOCK2.X we have introduced coarse-graining based on the Martini v2.2 force field. This feature is currently not available in HADDOCK3 and will need to be reimplemented. Further it would be interesting to investigate if a more flexible coarse-graining scheme would be possible (e.g. to have a coarser representation of molecules than the 4 to 1 mapping of Martini)

At the end of a docking workflow it would be nice to have statistics of the intermolecular contacts made between the various interfaces, reporting their nature (e.g. charged-charged, hydrogen bonds, hydrophobic) and visualising those in a matrix or circular plot form. This could be reported per cluster.

Provide in HADDOCK3 easy to read analysis reports with all cluster statistics and visualisation of results as done on the HADDOCK2 web server.

PMX offers a solution for creating hybrid structures and topologies for amino and nucleic acid mutations. This is made possible through pre-existing atom mapping schemes between the residues. These mappings form mutation libraries that depend on the force field being used. As new force fields are constantly being developed, it is crucial to keep the mutation libraries up to date. One of the main objectives of PMX’s development is to automate the process of generating mutation libraries and promptly update them whenever new force field versions are released. In addition, the latest PMX development includes support for the latest protein force fields, namely Amber14sb and Charmm36m.

The estimation of free energy differences suffers from convergence problems (poor overlap of forward and reverse work distributions in the framework of the Crooks fluctuation theorem) as the perturbation size (change in the number of heavy atoms) increases. The issue has been observed across various systems such as protein mutations, ligand mutations, absolute ligand binding free energy calculations, etc. In some cases, even a minimal perturbation (say, morphing an H to Cl) can also lead to problems with convergence. The accuracy of the estimated free energy differences directly deteriorates due to poor work distribution overlap. Hence, it is of utmost importance to detect and solve the issue with convergence for a more accurate estimation of free energy. 

In the future, as a part of PMX’s long-term development, we plan to automate the detection of the poor phase space overlap which contributes to the convergence issue and further provide measures for tackling the same.

We plan to focus on extensive documentation of the core background, algorithms and implementation details involved in the non-equilibrium alchemical calculations employing PMX. This includes explanations of basic theories like Jazynski’s equality, and the Crooks fluctuation theorem, as well as atom mapping, hybrid structure generation, and double-system/single-box (DSSP) for charge-changing mutations on the technical side. The updated documentation will be made available across various platforms, such as the PMX webpage, GitHub, and the BioExcel webpage to attract and guide a larger user base.

Currently, there are tutorials available on protein mutations and relative binding free energy calculations on the PMX webpage and on the GitHub page. However, some of these tutorials are outdated and involve the use of PMX as a command in CLI. We are aiming to create a systematic set of tutorials, all involving the latest jupyter notebook API and every tutorial will be provided as a single notebook file for better usability. The extensive set of tutorials will include the calculation of solvation free energy of small molecules in water and other organic solvents, estimation of partition coefficient using non-equilibrium alchemical approach, protein/ligand/nucleic acid mutations, absolute protein-ligand binding free energy, pKa estimation, covering all of the PMX base functionality. Overall, the documentation and tutorials will cover the entire range of PMX capabilities in a simple and easy-to-use manner and will be updated regularly.

  • Implement new and/or update already existing building blocks for GROMACS, HADDOCK and PMX
  • Implement new building blocks from user’s feedback (e.g. REMD, AWH, enhanced sampling)
  • Implement new building blocks for collaboration projects (e.g. Oxipro, EU-CanShare, ELIXIR 3D-BioInfo)
  • Implement new demonstration workflows as Jupyter Notebooks (e.g. CP2K, HADDOCK, ML)
  • Deploy HPC workflow(s) prototypes in EuroHPC supercomputers (e.g. drug discovery applications)