Modern chemotherapy is based on the administration of small molecules (ligands) interfering with disease-related cellular processes by targeting key macromolecules (receptors) such as proteins to name one. Due to several factors (including stringent requirements by drug regulatory agencies) drug design relies more than ever today on the mechanistic knowledge of drugs mode of action. The key process to be understood is molecular recognition, whose ultimate step is ligand-receptor association in a (meta)stable complex.

3D structures of such complexes, mainly obtained through experimental methods and collected in the protein data bank, are key for the success of structure-based drug discovery programs. The role of computer-aided drug design (CADD) within these programs has boosted in the last decades, whereby computational techniques have not only complemented experiments but also unveiled molecular data barely accessible to wet labs. Molecular docking (which mimics ligand-receptor association) and virtual library screening (which aims to identify active hits within virtual libraries of millions of compounds) are undoubtedly the most widely used tools of CADD. Despite its success, molecular docking (and thus virtual screening) is still challenged by receptors with significant structural changes upon ligand binding. One solution to address receptor plasticity is the so-called ensemble-docking approach, whereby several conformations of a receptor are used in multiple docking runs. Conformations can be generated by experimental and/or computational means, but as large structural changes are generally associated with high free energy barriers, standard conformational sampling methods (such as molecular dynamics or monte carlo simulations) starting from the unbound structure of proteins rarely evolve into ligand-bound conformations.

To tackle this challenge, we propose here an original approach, named EDES (Ensemble-Docking with Enhanced sampling of pocket Shape), that starting from the unbound structure of the receptor, generates ensembles of conformations that maximises diversity at the putative binding site(s) and, moreover, includes a relevant fraction of bound-like geometries. Our method combines binding site-detection algorithms, a novel protocol for biased molecular dynamics simulations to enhance the sampling of holo-like conformations of the binding site(s), and a multi-step cluster analysis approach to select conformations for molecular docking to be performed with HADDOCK. In particular, the sampling will be enhanced by introducing a set of new collective variables in metadynamics runs, which will force the binding site to change its shape by biasing the number of Contacts across Inertia Planes (CIP, Figure 1b). These are defined as the number of contacts realized by two groups of atoms lining on opposite sides of a plane perpendicular to each inertia axis and passing through the geometrical center of the binding site.

Figure 1. Workflow of the EDES approach.

The method is being validated for its ability to generate bound-like structures of tens of protein targets of pharmaceutical relevance undergoing small to large conformational changes upon binding to their ligands. This set will allow to validate the methodology also with targets undergoing small but tricky (costly) conformational changes in order to accommodate their ligands. The protocol will be benchmarked for its ability to recover native-like conformations of the various complexes and to enrich virtual libraries by discriminating active from non-active compounds assigning the highest affinities to the former.

Figure 2. Sampling of the holo-like conformational space along the CIPs variables (left), and performance of EDES in redocking runs with HADDOCK on a model system (beta-glucosyl transferase, see for more details.

As a long-term goal, we will setup a database containing for all studied proteins ensembles of conformations generated with the above described methods. This database, which will be continuously expanded in future, will be open to the community. It will also include several physico-chemical descriptors of the binding site(s) to facilitate its use.

The novelty of our approach is mainly that, at odd with most strategies developed in these years exploiting information on bound structures of the receptor, it relies only on the knowledge of the unbound conformations (although it could clearly be expanded to include such information if desired). Therefore, our method is fully independent from prior knowledge of ligand binding and suitable to detect new and allosteric binding sites on receptors and new hit compounds by virtual screening, as well as to better study the influence of different solvents on the overall druggability of proteins. In addition, each ensemble conformation will be associated with an estimate of its relative free energy cost with respect to the most stable structure in the absence of ligands. This will open the route to including the cost of conformational changes into protein-ligand scoring functions. We expect this database to become a game-changing resource for structure-based virtual screening of chemotherapeutic compounds, overcoming one of the most limiting aspects of computer-aided drug design; it will contribute to making the outcome of in silico studies less dependent from the specific tools used by different researchers.