As part of the collaborative framework, BioExcel CoE and MolSSI are organizing a series of workshops on the application of workflow solutions for biomolecular modelling and simulations. This first event will be by invitation only and will be hosted by BioExcel in Barcelona. A follow-up is planned for 2019 in the US hosted by MolSSI.

New: MolSSI and BioExcel Workflow Workshop 2018 Report (arXiv:1905.11863)

Background

Workflows are an increasingly important aspect of biomolecular simulation science. Over the years there have been a variety of projects around the world to develop workflow systems that would be well suited to the demands of this type of research, but to date it has been difficult to achieve widespread community uptake, or to develop a good model for sustainability.

The purpose of this workshop is to bring together current projects from both sides of the Atlantic that involve, as an element of their activity, the development and application of workflow systems. The aim is to foster a dialogue between them that supports the development of interoperable and/or harmonised solutions that have the best chance of being long-term sustainable products, because they have a worldwide community of molecular simulation scientists using them day-to-day and supporting their continued development.

It is not the intention that this workshop is a place for projects to make “sales pitches” – rather, we seek participants willing and able to discuss both the strengths and weaknesses of their approaches, what opportunities could come for synergising with other work, and where there are still gaps in provision and unmet need.

The process

In order to focus discussion on issues close to end-user needs, participants are asked to come prepared to discuss how their particular workflow system might be applied to two common workflow patterns, but there is no expectation that participants arrive with the example “solved”.

Example 1: The simulation-analysis loop


Workflows for enhanced sampling methods are often of this type – e.g. those that aim to calculate free energies or identify rare states (maybe, cryptic ligand binding sites). Key features are: looping, gather/scatter operations, large numbers of independent parallel simulations (the number of which may, or may not, be known in advance), and decision points.

Example 2: The pipeline


Workflows designed to perform complex analyses on large datasets are often of this type – e.g. those analysing the effects of protein mutations on drug efficacy, or reverse docking procedures (searching for the most likely protein target for a given ligand). Key features are: independent execution of the pipeline for each member of the input dataset, a requirement for the interfacing of a heterogenous collection of tools that may not have been designed, originally, to work with each other, and a significant likelihood that some input combinations will “fail” somewhere along the pipeline.

Considerations

In both cases, participants are invited to consider their approach to these workflows from two angles:

  1. a)  How would the workflow be written, and how easy would it be for a new user to write, from scratch, their own workflow that was of this pattern? Where do you see strengths, weaknesses, and gaps in provision?
  2. b)  How would this workflow be executed, and how easy would it be for a new user to execute it on their own computational infrastructure, whatever that might be? Where do you see strengths, weaknesses, and gaps in provision?

Agenda

11 December 2018

13:00 – 13:15 Introduction (Rosa Badia, Stian Soiland-Reyes, Shantenu Jha)

13:15 – 14:45 Brief overview presentations by participants on their workflow projects [Dropbox with slides] We would ask that you prepare this according to the following broad outline:

  1. The Science Drivers: what are the types of scientific problems that your workflow solution has been designed to help overcome?
  2. An introduction to your project/tool – a bit of background, history, state of development, broad-brush description of how it works and what it does.
  3. Examples/major successes in application to technology/science problems
  4. Major problems/challenges in application to technology/science problems
  5. A wish-list of what you would like to get out of this workshop.

14:45 – 15:15 Coffee break

15:15 – 16:45 Brief overview presentations by participants on their workflow projects

16:45 – 17:00 Break

17:00 – 18:00 Wrap up of day 1 and planning for Days 2 and 3, as needed

20:00 – Dinner

12 December 2018 – High throughput pipelines

On days two and three there will be opportunities to dive much deeper into your technology.

On day two you will have up to one hour to discuss  – and if desired demonstrate interactively – how your approach maps onto science/technology problems that require high throughput pipelines.

We hope you will design your contributions to these sessions to be as interactive with the audience as possible.

9:00 – 12:00 Pipeline Workflows – how to write such? (coffee break at 10:30)

12:00 – 13:00 Lunch

13:00 – 16:00 Pipeline Workflows – how to execute such? (coffee break at 14:30)

16:00 – 18:30 Hands-on hacking

20:00 – Dinner

13 December 2018 – Control flow and dynamic scheduling

On day three you will have another one-hour chance to do the same as on day two, but within the context of complex control flow (e.g. loops and scatter/gather operations) and dynamic scheduling.

9:00 – 12:00 Loop Workflows – how to write such? (coffee break at 10:30)

12:00 – 13:00 Lunch

13:00 – 16:00 Loop Workflows – how to execute such? (coffee break at 14:30)

16:00 – 18:30 Hands-on hacking

20:00 – Dinner

14 December 2018

9:00 – 12:00 Summary and next actions planned

12:00 – End of event