The human genome has revealed over 20,000 expressed proteins, known as the proteome, which are the workhorses of human life and perform all manner of critical cellular functions. These proteins, however, do not work in isolation, but by interaction with each other and with other biomolecules, thereby forming a complex network of interactions known as the interactome. This intricate network consists of all interactions involving hundreds of thousands of protein-protein and other biomolecular complexes. Irregularities in the functioning of the interactome, e.g. through environment-induced miscommunication or mutation-induced alterations to the proteome are responsible for many diseases, which is why it is important to understand how the interactome works at the atomic level.
Why is this project particularly interesting for BioExcel?
There are two main challenges associated with understanding the interactome:
- Adding the structural dimension to possibly hundreds of thousands of interactions between protein-protein and other biomolecular complexes
- Predicting what the interactome (the network) looks like given the knowledge of the proteome (the building blocks of the network), which requires predicting the affinity of the complexes involved
What are we doing in BioExcel?
The first challenge will require running massive amounts of independent docking runs in parallel, all generating large numbers of files. The second challenge will require all-vs-all docking of the network components and prediction of their binding affinity, e.g. using parallel molecular dynamics simulation. Doing so in an efficient manner and achieving the project goals will require efficient usage of high-throughput (HTC) and high-performance computing (HPC) including on Exascale resources, and efficiently handling and analysing large amounts of data.
These challenges will not be met exclusively by using HTC resources through the HADDOCK portal. We intend to modularize HADDOCK to decouple pre- and post-processing steps from the bulk of the computing to allow for the development of custom and efficient workflows that can be executed on HPC resources for the purpose of Exascale modelling of interactomes.
Both challenges will also benefit from current advances in AI and deep learning approaches for identification of near-native models from a large pool (e.g. from docking or other modelling methods) and the prediction of the binding affinity of those complexes.
Developments related to this project are also expected to benefit the antibody-antigen modelling project. Containerization of workflow modules is expected to facilitate deployment on HTC and HPC resources and on the in-house computational infrastructure of pharma companies.