Developing complex biomolecular workflows is not always straightforward. It requires tedious developments to enable the interoperability between the different biomolecular simulation and analysis tools. Moreover, the need to execute the pipelines on distributed systems increases the complexity of these developments.
To address these issues, we propose a methodology to simplify the implementation of these workflows on HPC infrastructures. It combines a library, the BioExcel Building Blocks (BioBBs), that allows scientists to implement biomolecular pipelines as Python scripts, and the PyCOMPSs programming framework which allows to easily convert Python scripts into task-based parallel workflows executed in distributed computing systems such as HPC clusters, clouds, containerized platforms, etc.
Using this methodology, we have implemented a set of computational molecular workflows and we have performed several experiments to validate its portability, scalability, reliability and malleability.
[maxbutton id=”4″ url=”https://doi.org/10.48550/arXiv.2208.14130″ text=”Preprint” linktitle=”arXiv: The BioExcel methodology for developing dynamic, scalable, reliable and portable computational biomolecular workflows” ]
Jorge Ejarque, Pau Andrio, Adam Hospital, Javier Conejero, Daniele Lezzi, Josep LL. Gelpi, Rosa M. Badia (2022):
The BioExcel methodology for developing dynamic, scalable, reliable and portable computational biomolecular workflows.
IEEE 18th International Conference on eScience (eScience 2022) (accepted)
10–14 Oct 2022, Salt Lake City, UT, USA