The 8th webinar in BioExcel’s webinar series on computational methods and applications for biomolecular research took place on 16th November 2016.
Ravi Madduri from Argonne National Laboratory introduced Globus Genomicsand discussed his work on large-scale analytical workflows on the cloud using Galaxy and Globus.
The series cover broad topics related to the latest development of major software packages; their application to modelling and simulation; best practices for performance tuning and efficient usage on HPC and novel architectures; introductory tutorials for novel users and much more.
BioExcel webinars include an audience Q&A session during which attendees can ask questions and make suggestions. They are a great opportunity to interact with the main code developers.
This is the first webinar from the BioExcel Workflows Interest Group, welcoming users (and potential users) of Scientific Workflow systems for data analysis and pipe-lining of biomolecular simulation and modelling tools; in particular using Galaxy, Apache Taverna, KNIME, COMPSs and Common Workflow Language.
This interest group covers pros and cons of these workflow system, practical workflow design and setup, also covers deployment questions on HPC/cloud infrastructure, like the use of Docker for packaging command line codes.
Register for webinar
Title: Large-scale analytical workflows on the cloud using Galaxy and Globus
Date: 16th November, 2016
Time: 16:00 CET
Please register at https://attendee.gotowebinar.com/register/5808939110698431491. You will then receive an email with details of how you can connect to the webinar.
Large-scale analytical workflows on the cloud using Galaxy and Globus
In this BioExcel webinar we are delighted to have Ravi Madduri from Argonne National Laboratory and University of Chicago present Globus Genomics, a system developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data, combining Galaxy workflows with cloud technologies like Amazon EC2 and Globus File Transfer.
This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler).
The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure.
Ravi’s work is part of the BD2K center Big Data for Discovery Science, building infrastructure for reproducible workflows using minids (minimal viable identifiers), analyzing data at scale using identified Docker containers, publish results in to Globus Publication services thus providing an end-to-end framework for reproducible research.
In this BioExcel webinar, Ravi will present Globus Genomics and the technologies used to achieve large-scale analytical Galaxy workflows on the cloud. We think this will be of interest not just for the genomics community, but for any scientific workflow users who need to consider distributed deployments, data management and scalability.
About the speaker
Ravi is actively involved in developing innovative software and networking technology. For example, as lead architect of the Reliable File Transfer, he designed novel testing and profiling capabilities, ensuring that it met the needs of key communities such as TeraGrid.
He implemented Grid file transfer patterns in the Java CoG Kit and developed a remote application virtualization infrastructure; the Grid-enable extension was incorporated in the Grid Service Authoring Toolkit and is used by NCI Information Systems.
Madduri is applying new technology in diverse science and engineering domains. For example, he is a key contributor to the Cancer Bioinformatics Grid. He played a lead role in the evolution of GridFTP and its adoption by researchers for the Laser Interferometer Gravitational Wave Observatory and the Large Hadron Collider. Moreover, as part of the NEESgrid project, he helped scientific teams incorporate Grid technology into their earthquake engineering research.
BioExcel’s webinar series covers a broad range of topics related to the latest development of major software packages, their application to modelling and simulation, best practices for performance tuning and efficient usage on HPC and novel architectures, introductory tutorials for novel users and much more.