WHAT IS HADDOCK DOING?

HADDOCK is a versatile information-driven flexible docking approach for the modelling of biomolecular complexes. HADDOCK distinguishes itself from ab-initio docking methods in the fact that it can integrate information derived from biochemical, biophysical or bioinformatics methods to enhance sampling, scoring, or both. The information that can be integrated is quite diverse: interface restraints from NMR or MS, mutagenesis experiments, or bioinformatics predictions; various orientational restraints from NMR and, recently, cryo-electron maps. Currently, HADDOCK allows the modelling of large assemblies consisting of up to 6 different molecules, which together with its rich data support, provides a truly integrative modelling platform.

TARGETED USERS

Any user interested in generating 3D models of biomolecular assemblies based on some experimental information (e.g. mutagenesis, cryo-EM, NMR, XL-MS, SAXS, etc.).

INPUT DATA

  • PDB files of the different partners (up to 6 partners in HADDOCK 2.2) (required)
  • Ambiguous and/or unambiguous distance/angle restraints between partners residues.
  • Experimentally identified or predicted important residues for the interaction
  • Distance restraints (from NMR, MS, or any technique provide some kind of distance information)
  • Various other NMR-based restraints

OUTPUT DATA

  • 3D models of the complexes as PDB files
  • Statistics and analyses of the generated models (clustering, energies, HADDOCK score, etc.)

WHY BIOEXCEL?

BioExcel, as a centre of excellence, is the perfect environment to make our software growing together with its community. It allows us:

  • to improve our bond with our user community and extend it (through support forum, webinars, interest groups, etc.)
  • to extend HADDOCK’s capabilities by connecting it through meaningful workflows to other important actors of the field (e.g. HADDOCK <-> GROMACS collaboration)
  • to professionalize our software development efforts, which will have a significant impact on the efficiency of our code development and improve the end user experience

LICENSING

HADDOCK software consists of a collection of python and CNS scripts and other additional scripts and programs (csh, awk, perl, c++). The HADDOCK distribution can be obtained free of charge for non commercial users by completing and returning (via email) the license agreement form to a.m.j.j.bonvin_@_uu.nl. You can find more details on the HADDOCK licensing page.

HADDOCK web server access is free for non-profit users upon registration.

DESCRIPTION

The physico-chemical foundations of protein interactions remain largely unknown despite their importance in all critical cellular processes. These interactions establish an intricate and dynamic molecular network -the interactome– in which subtle miscommunications often result in disease. High‑throughput experimental techniques such as yeast two-hybrid, tandem affinity purification and novel screening technologies are providing an abundance of qualitative and quantitative data on these interactions. However, understanding function and designing drugs that target these complexes requires complementing the above-mentioned methods with structural information. The large gap between the number of interactions and available experimental 3D structures calls for complementary computational methods to produce accurate predictions and guide experimentalists. This is the field of computational structural biology, which has seen in the last decade fascinating developments both in software and hardware. Computational structure prediction is nowadays routinely considered an integral part of research.haddock

The docking field, in particular, has thrived in the last decade since the beginning of the CAPRI (Critical Assessment of PRedicted Interactions) experiment (Janin et al. 2003), in which the participants are asked to predict the structure of an unknown biomolecular interaction. Interestingly, the most successful participants of the latest CAPRI rounds fall in the category of ‘data-driven’, ‘information-driven’, or ‘integrative modelling’ algorithms. These approaches were developed to counter the inaccuracies of computational sampling and scoring methods by feeding them whatever experimental information is available for a given interaction. As recent CAPRI assessments have shown, this synergy is often enough to drive the docking calculations toward the right answer. As such, computational modelling of complexes has grown into a well-accepted complementary method to classical experimental techniques such as nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography.

One integrative method that we have developed for over ten years now is our information-driven docking approach, HADDOCK. It supports the incorporation of a large variety of data from NMR and other biophysical methods (e.g. cross-links from MS, EPR-derived distances, mutagenesis data) to drive the modelling process, as well as the use of SAXS and IM-MS data to filter docking solutions.

Cumulative number of HADDOCK jobs run through EGI

The software is made available through a user-friendly web interface (HADDOCK 2.2), which has attracted a large user community worldwide (>7500 users), submitting a sustained number of computations to HTC infrastructures like the EGI (>7M jobs per year) (see histogram) and resulted in over 110 deposited structures of complexes in the PDB (Protein Data Bank). HADDOCK has demonstrated a strong performance in the blind docking experiment CAPRI, belonging to the best performing approaches and is currently the most cited software in its field.

FUTURE DEVELOPMENTS

With the advent of integrative (computational) structural biology, it is clear that no single researcher will possess all the necessary expertise and resources to tackle the scientific challenges ahead of us in the study of the large and complex biomolecular machines playing major roles for health (e.g. drug design), food (e.g. crop improvement) and engineering of enzymes of industrial significance. One of the challenge for e-(Life)Science is therefore to provide integrated and user-friendly software solutions to serve a growing number of both academic and industrial communities, building on the most suited computing solutions for each problem (being it distributed (grid), cloud or HPC solutions). Next to that, it will be important to keep integrating new data types into the modelling to remain at the forefront of a fast developing field. The typically large number of interactions to be modelled and the even larger amount of data that will be generated call for innovative and reusable eScience solutions to automate the entire process as much as possible, make it run efficiently of both HPC and HTC infrastructures, and provide novel interaction analysis and visualization tools making use of modern web/open-source technologies to ensure the widest impact and ease of distribution.

In the case of HADDOCK, the flagship software developed by the Utrecht partner, we aim at raising the professionalism level of software development and optimizing it for the best infrastructures (which might vary depending on the end user perspective). This will greatly impact its use, wide dissemination and usage, and also increase its attractiveness for industry. For we will work on:

  1. Workflow solutions.
    Considering the large amount of “omics” data generated these days, there is need for efficient workflows to enable large scale modelling of biomolecular interactions. We will investigate various workflow solutions to automate model generation (i.e. pre-docking and docking) and data extraction and analysis (post-docking). Such workflows should integrate a variety of HPC and HTC computing resources to make optimal use of existing computing infrastructures.
  2. Data model.
    We plan to define a suited data model and database solution to gather the rich information generated by HADDOCK. Such a data model should be in principle generic, tuned for the analysis of data originating from various biomolecular simulation packages and as such suited for other applications (e.g. analysis of data from molecular dynamics simulations). Novel and interactive visualization technologies (e.g. D3.js) can then be built to operate on this data model to enable interactive post-analysis of the docking results.
  3. Optimization for various e-Infrastructure solutions.
    Within the FP7 e-Infrastructure projects eNMR and WeNMR, the HADDOCK web server has already been adapted to make use of local, grid (via the European Grid Initiative and the US Open Science Grid) and crowd computing resources (via the International Desktop Grid Federation). The standalone version of HADDOCK has also been used on HPC resources under various FP7 HPC-Europe and HPC-Europe2 programmes (e.g. at SARA, Cineca and Mare Nostrum supercomputer center). Each e-Infrastructure solution brings its own requirements when it comes to job scheduling and efficiency. Within this CoE project we will optimize HADDOCK to offer the optimal solution for each infrastructure (HTC, HPC, Cloud).
  4. Cloud packaging
    We plan to integrate server and analysis tools, offering workflow and self-contained solutions for cloud usage. This is interesting both for dissemination and training purposes and for industrial users, which in most case rather not use public servers
  5. Open source computational engine.
    HADDOCK currently uses CNS (Crystallography and NMR System) as computation engine for the calculations. While this software is freely available for non-profit users, it falls under Accelrys licensing for commercial users. We investigate the use of other open source, computational engine solutions to drive the modelling process in HADDOCK, like for example Gromacs or openMM. These are highly optimized codes that are running extremely efficiently on a variety of resources. The challenge however will be to support the large variety of data currently implemented into HADDOCK via CNS, its powerful selection syntax and scripting capabilities. This is a challenging task and to date so single solution would offer the required functionalities without a major coding effort.

TRAINING

Several resources dedicated to HADDOCK web server and standalone software are available:

Various tutorials:

More tutorials are available as PDF files, typically used and distributed during workshops. These will be rewritten and migrated to the bonvinlab.org/education web site in the course of the project.

Further the main support mechanism for HADDOCK-related questions is now: ask.bioexcel.eu