Advances in Automation and Efficiency for the Exascale Era – Experiences from the Biomolecular Sciences

BioExcel is pleased to announce the first joint mini-symposium with MolSSI following the establishment of our strategic partnership. The series of presentations will finish with a round-table discussion with representatives of both organizations.

Session: MS28 – Advances in Automation and Efficiency for the Exascale Era – Experiences from the Biomolecular Sciences (Minisymposia Session IV)
When: Tuesday 2018-07-03 16:00 — 18:00 CEST
Where: Samarkand Room, PASC 2018 conference, Congress Center Basel, Basel, Switzerland

PASC2018 dates: 2018-07-02 — 2018-07-04 (schedule)
PASC2018 registration: https://pasc18.pasc-conference.org/registration/registration
Early bird
registration deadline: 25 May 2018

Description

Life Sciences have become crucially dependent on software in many steps of the research process such as analysis of experimental data, systems modelling and simulation, data integration across various repositories and databases etc. The dramatic increase of available tools has enabled scientists to perform ever more complex studies while taking advantage of modern high-end (HPC and HTC) compute facilities. Experimental facilities are producing staggering amounts of data which led to the rapid development of novel data analytics and machine learning techniques.

We are approaching the “industrialization” era of scientific computing in which manufacturing processes of “workers with wrenches” are no longer sufficient. We need to build software “production lines” that are to a large extent automated with as little need for supervision as possible. These lines will be designed not simply to accomplish specific tasks, but to solve concrete, complex, scientific problems.

Thus, there is a need for additional focus on areas such as:

  • improving the interoperability of software applications
  • enable better coupling of tools with data sources
  • developing efficient libraries for the upcoming Exascale era in HPC
  • devising user-friendly and extensible workflows/pipelines

These advances will considerably improve the productivity of researchers and allow them to address novel scientific challenges.

At this mini-symposium we invite leading experts from two major institutions in the field – BioExcel, the European Center of Excellence for Computational Biomolecular Research, and MolSSI, the Molecular Sciences Software Institute from USA to discuss the advances in this important field.

Presenters

Shantenu Jha, Rutgers University, USA
“Building Blocks for Adaptive Workflows”
Next-generation exascale systems will fundamentally expand the reach of biomolecular simulations and the resulting scientific insight, enabling the simulation of larger biological systems (weak scaling), longer timescales (strong scaling), more complex molecular interactions, and robust uncertainty quantification (more accurate sampling). Solving biological problems that require longer timescales, involve more complex interactions and robust uncertainty quantification will require significant algorithmic improvements that incorporate high-level parallelism and leverage the statistical nature of molecular processes. Interestingly, many such simulation algorithms require adaptive workflows. We argue the need for workflow-systems using a building blocks approach to support adaptive workflows on extreme-scale heterogeneous and dynamic resources. We discuss RADICAL-Cybertools as an implementation of the building block concept, and discuss how RADICAL-Cybertools are being used to support a wide range of adaptive workflows in biomolecular simulations.
Stian Soiland-Reyes, The University of Manchester, UK
“Facing Compute Platform Portability Challenges with Scientific Workflows – Experiences from Common Workflow Language ”
Scientific Workflow systems are well established for computational analysis in all science domains, following the rapid development of workflow technology and community practices spanning the two recent decades, the eScience era. Workflow systems have gained traction in the era of Big Data Science due to their “ASAP properties”: Automation over repetitive pipelines and simulation sweep campaigns; Scaling over computational infrastructure and handling large data; Abstraction to shield users and programs from complexity and incompatibilities; and Provenance to auto-document execution logs and data lineage for future analysis. A major hindrance for wider adaptation and reuse of workflows, even when open source, is that they are written for specific workflow systems or infrastructures. Common Workflow Language (CWL) has emerged as a community initiative with support across a range of existing workflow engines, using a language specification that focus on the common denominator of command line tools exchanging files. Support for CWL on HPC expanded in the recent months, such as IBM’s CWLEXEC on LSF, or Toil with Singularity. In this talk we will present the challenges of moving CWL workflows towards Exascale, while retaining key features of workflows such as reproducibility, interoperability, usability and provenance.
Adam Hospital Gasch, Institute for Research in Biomedicine, SPAIN
“Workflow Automation and Efficiency for Macromolecular Simulations and Screening”
Life science is one of the largest and fastest growing communities in terms of needs for high-end computing. Biological studies usually require an integration of different computational approaches, defining complex, automated multi-step analysis workflows with inter-dependent steps, including CPU-intensive tasks generating large amounts of data. This number and diversity of tasks to be integrated, together with the short lifetime and fast turnover of computer codes and life sciences-related methods, make standardization of these workflows an extremely challenging task. BioExcel CoE has been working, together with Elixir project, on putting forward a set of best practices to develop, document and describe life sciences workflows, following the FAIR principles: Findability, Accessibility, Interoperability and Reproducibility. Examples of the first workflow prototypes implemented following this approach (Automatic modeling of protein mutations and Virtual Screening), illustrating the benefits of the introduced best practices, will be presented.
Prof. Erik Lindahl, Science for Life Laboratory, KTH Royal Institute of Technology, SWEDEN
BioExcel Lead Scientist
Round-table discussion speaker
Prof. T. Daniel Crawford, Virginia Tech, USA
MollSSI Director
Round-table discussion speaker
Rossen Apostolov, KTH Royal Institute of Technology
BioExcel Manager
Round-table discussion moderator

Round-table discussion: “Simulations at Exascale – Myth or Reality?”

Exascale supercomputers are around the corner. Producing them will be a real challenge, no doubt, considering issues with processor design, power-consumption and so on but engineers are confident about their delivery within a few years. Life Science (and not only) software applications are capable of running at peta-scale in HPC/HTC regime, but are they ready for the next level push? When the Exa-machines come, will there be simulation engines and job dispatchers able to orchestrate billions of cores? Will researchers be able to tackle major scientific problems and deliver amazing discoveries that are unattainable at lower computing scale? How well prepared are the communities?

We have invited Prof. Erik Lindahl, Lead Scientist of BioExcel, the European Center of Excellence for Computational Biomolecular Research and Prof. Daniel Crawford, Director of MolSSI, the Molecular Sciences Software Institute, USA, together with the speakers, leading experts in the field, Shantenu Jha, (MolSSI) and Adam Hospital, Stian Soiland-Reyes (BioExcel) to address these questions and try to understand what is needed to improve the interoperability of software applications, enable better coupling of tools with data sources, develop efficient libraries and devise user-friendly and extensible workflows/pipelines for the upcoming Exascale era in HPC. Moderator of the discussion will be Rossen Apostolov.


Location

Conference website: https://pasc18.pasc-conference.org
Registration: https://pasc18.pasc-conference.org/registration/registration
Early bird
registration: 25 May 2018
Room: Samarkand Room