Simulating a whole cell, including its internal components, has been one of the grand challenges of structural and computational biology. In recent years, advances such as improved computational performance, better coarse-grained models and better experimental methods such as cryo-electron tomography have enabled researchers to start addressing this grand challenge.
To do so, the computational limitations of the GROMACS simulation software, which was originally designed in the 1990s and couldn’t handle the massive scale of a minimal cell model containing around 700 million atoms, had to be addressed. The software was limited by its use of 32-bit integers for particle indices, which capped the system size to a maximum of about 179 million atoms. The slow input/output (I/O) performance, with reading a single coordinate frame taking up to 20 minutes, was another issue. Additionally, using single-precision floating-point numbers led to a significant loss of precision for interatomic distance calculations in such large systems. Finally, there are still remaining challenges with the stability of the simulation over longer timescales, which could be caused by imperfections in the initial models or small calculation drifts.
Why is this project particularly interesting for BioExcel?
This project is fascinating because it aims to simulate an entire living cell at a molecular level, something that was previously impossible. This is a first-of-its-kind effort that combines recent advances in cryo-electron tomography with coarse-grained modeling and computational enhancements. It builds on the creation of the JCVI-syn3A minimal cell, the smallest self-replicating organism, and leverages experimental data from it to create a detailed, lattice-level model. This work is a critical step towards a complete computational model of a living organism, with potential to understand how all its parts work together.
What are we doing in BioExcel?
Our team is redesigning and improving the GROMACS software to handle these massive system sizes. We’ve addressed the 32-bit integer limitation by extending the number of atoms to unsigned integers, allowing for up to 1.4 billion atoms. We also created a new file header for the highly compressed XTC trajectory format to support 64-bit indices for systems that require it, ensuring future compatibility. We’ve significantly improved the I/O performance by a factor of ten by using XDR vector routines instead of handling each byte individually. For precision, we’ve temporarily switched to double-precision floating-point numbers for extreme-scale simulations and plan to implement local/relative coordinates in the future. These changes have already allowed us to successfully simulate a scaled-down version of the minimal cell for 100 nanoseconds.