PMX future development

Future development

Development priorities based on user feedback

Mutation libraries: New libraries

PMX offers a solution for creating hybrid structures and topologies for amino and nucleic acid mutations. This is made possible through pre-existing atom mapping schemes between the residues. These mappings form mutation libraries that depend on the force field being used. As new force fields are constantly being developed, it is crucial to keep the mutation libraries up to date. One of the main objectives of PMX’s development is to automate the process of generating mutation libraries and promptly update them whenever new force field versions are released. In addition, the latest PMX development includes support for the latest protein force fields, namely Amber14sb and Charmm36m.

Features: Automatic detection and solution of convergence issue

The estimation of free energy differences suffers from convergence problems (poor overlap of forward and reverse work distributions in the framework of the Crooks fluctuation theorem) as the perturbation size (change in the number of heavy atoms) increases. The issue has been observed across various systems such as protein mutations, ligand mutations, absolute ligand binding free energy calculations, etc. In some cases, even a minimal perturbation (say, morphing an H to Cl) can also lead to problems with convergence. The accuracy of the estimated free energy differences directly deteriorates due to poor work distribution overlap. Hence, it is of utmost importance to detect and solve the issue with convergence for a more accurate estimation of free energy.

In the future, as a part of PMX’s long-term development, we plan to automate the detection of the poor phase space overlap which contributes to the convergence issue and further provide measures for tackling the same.

Documentation

We plan to focus on extensive documentation of the core background, algorithms and implementation details involved in the non-equilibrium alchemical calculations employing PMX. This includes explanations of basic theories like Jazynski’s equality, and the Crooks fluctuation theorem, as well as atom mapping, hybrid structure generation, and double-system/single-box (DSSP) for charge-changing mutations on the technical side. The updated documentation will be made available across various platforms, such as the PMX webpage, GitHub, and the BioExcel webpage to attract and guide a larger user base.

Tutorials

Currently, there are tutorials available on protein mutations and relative binding free energy calculations on the PMX webpage and on the GitHub page. However, some of these tutorials are outdated and involve the use of PMX as a command in CLI. We are aiming to create a systematic set of tutorials, all involving the latest jupyter notebook API and every tutorial will be provided as a single notebook file for better usability. The extensive set of tutorials will include the calculation of solvation free energy of small molecules in water and other organic solvents, estimation of partition coefficient using non-equilibrium alchemical approach, protein/ligand/nucleic acid mutations, absolute protein-ligand binding free energy, pKa estimation, covering all of the PMX base functionality. Overall, the documentation and tutorials will cover the entire range of PMX capabilities in a simple and easy-to-use manner and will be updated regularly.

Development roadmap features

Core code and usability: Conda

To facilitate a seamless installation and usage of PMX across different platforms, we intend to make PMX accessible through the Conda packaging system, which is already being utilized by BioExcel Building Blocks. Our ultimate objective is to distribute PMX via the BioConda channel. At present, we have prepared installation instructions for a swift and hassle-free setup of PMX using Python’s pip install.

Core code and usability: Further modularization

The PMX library is structured in a modular way, with classes for managing molecular structures and topologies. However, the alchemical hybrid structure/topology setup is provided as a collection of specialized scripts that rely on the PMX infrastructure. In certain scenarios, such as performing large scale mutational scans or custom free energy workflows, it is preferable to avoid using external scripts. One way to achieve this is by incorporating the functionality of the existing setup into independent classes within PMX. This would enable users to easily utilize separate modules of these classes, resulting in the creation of more personalized and specialized workflows.

Recently, we accomplished the development of a modular system for managing protein, nucleic acid, and ligand modification. The analysis calculations have also been modularized, making them easily adaptable to customized workflows. We have introduced new classes for setting up overall free energy calculations and submitting simulations to the cluster. Enhancing these classes will be a key focus for future PMX development, as they enable the creation of comprehensive free energy workflows.

Currently, the updated and modularized version of PMX is present as the “develop” branch and we are planning to move it to the “master” branch for better usability with the next release.

Mutation libraries: Automated tests for mutation libraries

Currently, each recently produced mutation library undergoes testing within a generated topology to ensure its compatibility with a set of scripts that exist separately from the main PMX library. The objective is to integrate the testing procedures into PMX and facilitate a user-friendly automated execution of these tests for each newly developed mutation library.

Mutation libraries: Unified amino acid and DNA mutation libraries

The mutation libraries are currently divided according to the type of biomolecule: amino acids and nucleic acids each have their own libraries, resulting in different force fields. However, it is not essential to separate them in this way. It is possible to combine a protein force field that includes amino acid mutations with a DNA force field that includes nucleic acid mutations. This merging of mutation libraries, while seemingly straightforward, does require some internal reorganization of PMX. Now that the core PMX functionalities are already transitioned into Python3, the next step in further completing mutational libraries will be creating various combinations of the compatible protein and DNA force fields.

Features: Amino acid protonation

An often-requested PMX feature is the estimation of free energy variations caused by amino acid protonation/deprotonation (pKa differences). From the standpoint of alchemical calculations, protonation can be seen as a specific mutation of hydrogen attached to an amino acid that appears and disappears. In reality, several amino acids can already have their protonation modifications probed using the PMX mutation libraries. However, just a few instances of this feature have been tested against experimental data. Additionally, the calculation of proton annihilation is inextricably linked to the unintended modification of the overall system charge.

While there are ways to keep the simulation box’s charge constant during the simulation, it’s crucial to create an automated setup for this process. Recently, we have demonstrated that large scale pKa calculations can be performed easily using PMX/GROMACS. Having the first proof-of-principle result, we are aiming to develop an automated and convenient setup to allow for a systematic assessment of pKa changes in proteins and ligands.

Features: Post-translational modifications

Another commonly sought PMX feature is amino acid changes that signify post-translational modifications. It is important to create a flexible creation of random mutations in a library in order to implement this.

This activity shares some similarities with ligand modifications, which are alchemical transformations between different molecules. Furthermore, adding post-translational changes necessitates expanding the core PMX in charge of finding and managing entries in mutation libraries. These adjustments make up a long-term objective for PMX development.

Features: Covalent modifications

In the conventional free energy calculation setups, bond breakage is discouraged. But there are some circumstances, such proline-related mutations, where it is impossible to avoid destroying a link. For example, the PMX “develop” branch currently readily permits proline mutations that entail ring opening/closure and, as a result, covalent bond breaking/formation. In these circumstances, well-constructed topological generations may demonstrate that it is possible to allow the removal of a bond. Further research is needed to see how these techniques might be applied in more generic circumstances, such as the development of covalent drugs. The support of at least some typical alchemical covalent changes by PMX is a long-term objective.

Features: Absolute binding free energy calculations

Calculations of the relative protein-ligand binding affinities are easily supported by PMX. With the use of these computations, it is possible to compare the binding free energies of ligand libraries made up of chemically related molecules. To enable binding affinity estimates for any arbitrary compound without the requirement to build and study a library of related compounds is conceptually the next big challenge. In order to use this method, absolute protein-ligand binding free energy must be calculated.

We have now shown that the PMX/GROMACS program can be used to perform these calculations and have obtained accurate estimates for a wide range of different systems. Further, the development of absolute protein-ligand binding affinity calculation procedures for a practical setup in high throughput drug discovery pipelines will be the next stages in this direction.

Features: Workflows

PMX is already capable of generating input for various types of alchemical modifications, including all the necessary preparation and pre-processing. In the beginning of BioExcel-2, the responsibility of carrying out the actual free energy calculation was left entirely to the user. One of the primary objectives of PMX development is to offer a set of workflows that enable easy simulation setup using readily available hybrid structure/topology input. These workflows will cover both equilibrium and non-equilibrium free energy calculation protocols, utilizing PMX as the input generator and GROMACS as the simulation engine.

As of 2021, PMX has incorporated classes for building a free energy workflow into its “develop” branch. An example case for calculating the relative protein-ligand binding affinity has been created and is provided to users as a tutorial that can be easily tailored to suit specific project requirements. With the main classes for constructing the workflow already in place, the next steps involve developing workflows for protein mutations, DNA mutations, and absolute protein-ligand binding free energy calculations.