In the beginning of the decade scientific communities started facing an interesting problem. Hardware vendors were designing ever more efficient computer chips, faster accelerators and building larger and larger supercomputers. Software, however, was struggling to keep up. Most codes weren’t tuned for extreme-scale. In order to make an efficient use of those machines, a researcher would need quite in-depth understanding of hardware architectures, HPC software stacks, parallelization algorithms, and asses their effect on the implemented methods as well as final results of the calculations. Not an easy task. We had at our disposal tremendous compute resources yet very few were able to make best use of them. Something had to be done.
Early 2014, Amsterdam. A conference room at Schiphol airport. European research groups working on leading and popular software applications in the Life Sciences met to answer a simple question: “Tens of thousands of scientists worldwide are using our codes, crucial for their work, yet everyone struggles to be productive. What can we do about it?”. The researchers around the table had many good ideas but needed funding to bring them to life. And there came the opportunity. The European Commission, after heavily investing in hardware infrastructure over many years, was aware of the need for advanced software development in order to exploit the infrastructure, as well as a need for better community support. The working group put down a proposal to the commission with clear objectives:
- Improve the performance, efficiency, and scalability of selected software packages of high importance for biomolecular Life Science research on next-generation HPC systems.
- Improve the usability of ICT technologies for biomolecular researchers
- Promote best practices and train end users to make best use of both software and computational infrastructure
- Work towards a long-term operational model
The proposal was accepted. And on 3rd November 2015, BioExcel was born.
A lot has happened since then. During those five years, our great team improved the scaling, high-end performance and throughput (in some cases more than double) of all core applications – GROMACS (molecular dynamics), HADDOCK (integrative modelling), CPMD and CP2K (hybrid-QM/MM). PMX (Free Energy) grew from a small tool into a major application for drug screening. Co-design projects with all major HPC CPU and GPU vendors ensured that our codes do run on any supercomputer in the world. Some codes are even ready for next-generation hardware that is still under development. We adopted open roadmaps for development and have added a plethora of features as requested by the user community.
But supercomputers require superusers. And using powerful applications is never an easy task. We are working on that. Our workflow solutions, based on popular frameworks such as CWL, Galaxy, PyCOMPSs, Jupyter Notebooks and containerization (Docker/Singularity) and packaging (Conda), are now production ready to screen 1000s of potential drugs in a matter of days. Our framework BioBB (BioExcel Building Blocks) now boosts over 100 modules to mix and match, and a template to build a block with your favourite code!
But above all, BioExcel was working with, and for, the community. Which is growing fast. Our Twitter account now has over 2000 followers, the unique views on our project is consistently around 2500 per month, and the BioExcel mailing list has over 1800 subscribers. We have been running very popular webinar series, produced nearly 50 editions with thousands of views. Our YouTube channel just hit 1000 subscribers. We have made freely available 23 tutorials and provided direct in-depth support to over 1000 of users. The AskBioExcel.eu forums have more than 450 user queries resolved, and almost 140,000 page views during the last year, of which 50,000 from registered & logged-in users. We have developed a successful competency-based training programme with integrated remote training capabilities.
And then COVID-19 struck. Within the early days of the pandemic, BioExcel restructured efforts to address the crisis. We started COVID-19 specific research. We focused on facilitating collaborations, extending community support, and providing access to HPC resources at partner centers. We partnered with MolSSI to establish the Covid-19 Molecular Structure and Therapeutics Hub; with Exscalate4Cov Consortium to screen for active compounds; we launched a dedicated web-server interface; together with EOSC- Life we rapidly set up COVID-19 Workflow Hub (listed on the COVID-19 Data Portal); doubled the number of concurrent jobs on the HADDOCK server to meet demand; signed a community letter in support of initiatives to share biomolecular modelling and simulations data. We shared experience and results.
Five years passed and we are not done yet. Exascale HPC computer systems are around the corner, tremendous opportunities lay ahead. And we are prepared. We continue our mission to provide Life Sciences researchers with high-quality, user-friendly software. To increase their expertise and skills. To strengthen the community. To put extreme-scale computing at the heart of Life Science research. And to help society. Together.