Standardization of BioExcel Building blocks for easier deployment in different computational environments


BioExcel project has defined a computational architecture for simulation-related building blocks that assures interoperability and reusability in different workflow environments. The use of the building blocks library in a number of different computational environments (desktop, VM, small clusters, and HPC) and workflow managers (PyCOMPSs, toil, Galaxy) has been demonstrated. However, the adaptation of the demonstration workflows to the different environments still requires some degree of manual tuning to deal with specific features of each environment. This makes somehow complicated the distribution to final users. On the other hand, several software packaging initiatives are available (Python PyPI (pip), (bio)conda, Galaxy ToolShed, HPC environment modules, jupyter notebooks, etc.).

We are already working in the integration of biobb in HPC modules, Python pip, and jupyter notebooks. In addition to that, there is an increasing trend of using software containers as standard way of distributing software. For instance, within ELIXIR, BioExcel is contributing to establish a general infrastructure for the use of containers in bioinformatics, based on BioContainers, largely based in the leverage of BioConda packages.

In this context, a new project has been started, where BioExcel is going to work on two main tasks:

  1. Develop automatic adaptors of the BioExcel Building Blocks (biobb) library for a series of deployment options:
    1. CWL specification (automatic adaptation to CWL 1.1. just released). CWL complaint managers.
    2. Conda packaging, and integration into BioConda registry, which provides system-agnostic executable environments.
    3. Galaxy ToolShed, to spread the user scope and bridge with more genomics oriented workflows usually available in this context.
    4. GA4GH TES, and WES No real software follows such specification yet, however there is a general agreement in the need of considering it.
    5. Additionally, we will build recipes for including PyCOMPSs in the same deployment environments, especially BioConda.
  1. Build software containers to deploy components of the library, and implement automatic orchestration of the containers using PyCOMPSs, CWL, but also new popular tools like Kubernetes. This approach links directly with the guidelines of the Tools Platform from ELIXIR. Containers will be included in ELIXIR

References:

 

https://biocontainers.pro/

https://kubernetes.io/

https://galaxyproject.org/toolshed/

https://www.commonwl.org/

https://github.com/ga4gh/task-execution-schemas

https://github.com/ga4gh/workflow-execution-service-schemas

http://journals.sagepub.com/doi/10.1177/1094342015594678