Quantitative structure-activity relationships have been used for years to anticipate new drug activities, but challenges remain
Mark Greener
Greener is a freelance writer based in Cambridge, UK
Some 40 years ago, Corwin Hansch, PhD, a chemist at Pomona College, Claremont, Calif., began quantifying relationships between a compound’s physicochemical properties and its biological activities. His series of papers laid the foundation for quantitative structure-activity relationships (QSARs). Around the same time, the publication of the Free-Wilson analysis helped chemists characterize substituents or structural elements such as substructures and chiral centers to biological activity. Since then, researchers have employed QSAR to predict the activities of new drugs.
Yet the challenge remains the same, says Arthur Doweyko, PhD, senior principal scientist at Bristol-Myers Squibb, Princeton, N.J.: “How well can QSAR predict activity?” Although compelling in theory and an area of active research in practice, QSAR hasn’t lived up to its early promise. However, a new generation of models has honed QSAR’s predictive abilities.
Fundamentally, QSAR aims to identify relationships between some aspects of molecular structure and properties such as toxicology, pharmacodynamics, and pharmacokinetics. Conventional QSAR examines the structure of a drug or natural ligand: it does not include interactions with the receptor. Receptor modeling, in contrast, uses the protein structure or generates a three-dimensional (3D) surrogate of the binding site to compute these critical interactions. Furthermore, QSAR usually retrospectively fits observed data to selected molecular descriptors.
Doweyko concluded that only about half the 3D-QSAR models published in the last decade made “reasonable” predictions about test compounds not used to create the model [A. M. Doweyko, J. Comput. Aided Mol. Des., vol. 18, pp. 587-596 (2004)]. Conventional QSAR, it seems, works best when predicting activities of compounds similar to those in the training set. It is less reliable where it would often be most valuable: extrapolating beyond known descriptors to predict activities of truly novel compounds.
Nevertheless, QSAR has had several notable successes. “Some of the more exciting efforts within the pharmaceutical industry have been directed toward refining equations designed to predict liabilities or physical properties. . . . In these instances, there is generally a large body of data for a large number of compounds. In this context, the predictive abilities of the resulting QSAR equations are better than average, since there is a greater likelihood that descriptors for the untested compound fall within the bounds used to create the equation.”
There is a shift underway in QSAR’s application to drug development, says Angelo Vedani, PhD, director of the not-for-profit Biographics Laboratory 3R and associate professor in the Department of Pharmacy, University of Basel, Switzerland. “QSAR are used less and less to predict binding affinities per se and increasingly employed to predict ADMET parameters.” This shift reflects the fact that QSAR models have become increasingly sophisticated over the last few years, adding numerous extra “dimensions” (see table at right). “Higher-dimension QSARs are noble attempts at handling the conformational flexibility of molecules,” says Doweyko. He says, however, that new models are often more complex to use and are relatively early in their development.
Rebecca Wade, PhD, leader of the molecular and cellular modeling group at EML Research, Heidelberg, Germany, believes the “dimensional” terminology is misleading. “QSAR folks are used to thinking in unusual dimensions—that is what principal components analysis gives you. But the 3D in ‘3D-QSAR’ refers to ‘real’ spatial dimensions.” As a result, she argues, further dimensions such as time or chemical variation and changes in protonation should follow this rule. Wade says, however, that QSAR methods need to deal with multiple possible conformations.
Moreover, Vedani says conventional QSAR studies rely on the correct positioning of the ligand with its active site. But determining the 3D structure of a drug’s protein target is often difficult. “In my opinion, 3D-QSAR should be used only where the conformation can be identified using the 3D structure of the target protein or when very rigid structures are analyzed,” Vedani says.
Data sets for 4D-QSAR include all possible conformations, orientations and, in some cases, protonation states. In most cases, binding between a ligand and site arises from
Defining Dimensions in QSAR 1D-QSAR: Affinity correlates with pKa, logP, etc. 2D-QSAR: Affinity correlates with a structural pattern (e.g., chemical connectivity). 3D-QSAR: Affinity correlates with the three-dimensional structure. Receptor modeling: QSAR based on the interactions of the ligands with a 3D receptor surrogate. 4D-QSAR: As with 3D, but with multiple representations of ligand conformation/orientation. 5D-QSAR: As with 4D, but with multiple representations of induced-fit scenarios. 6D-QSAR: As with 5D, but with multiple representations of solvation models. (Source: Based on information provided by Biographics Laboratory 3R) |
weak interactions, such as hydrogen bonds formed between proton donors and acceptors. Covalent interactions, where bonds are broken and reformed, tend to be less important. Nevertheless, most QSAR researchers ignore protonation, despite its ability to affect a drug’s behavior in vitro, says Evgueni Kolossov, PhD, QSAR program manager for IDBS, a software provider in Guildford, UK. The 4D-QSAR models can include protonation. The model uses a “genetic” algorithm that selects the most bioactive conformation, which it integrates into the best model.
More predictive powers
Additional dimensions further hone predictive powers. For instance, 5D-QSAR allows the model to include induced fit, which occurs during the binding of many ligands. In this case, the ligand changes the protein’s shape, bringing the active parts into proximity with the substrate. “In such cases, a model based on a rigid receptor surrogate is worthless,” Vedani says.
Two software programs developed by Biographics Laboratory 3R, Quasar and Raptor, allow for different induced-fit scenarios. “Simulating induced fit is, in my opinion, one of the most important recent advances in QSAR. After all, induced fit is one of the key mechanisms that allow life to adapt continuously to ever-changing conditions at the level of the organism and the molecule.”
Vedani runs a pilot project using 6D-QSAR. The additional dimension simultaneously considers various salvation models, which is when solute and solvent molecules combine using relatively weak covalent bonds, to screen for adverse effects in silico.
The “dimensional” approach isn’t the only novel model that improves QSAR analysis. Comparative binding energy (COMBINE) analysis yields predictive QSARs as well as mechanistic insights. The European Molecular Biology Laboratory in Heidelberg, Germany, in collaboration with Federico Gago’s group at the University of Alcala de Henares in Madrid, developed Comparative Binding Energy (COMBINE) analysis. “If you know the structure of a receptor-ligand complex, you should use this information to the full in deriving QSARs,” says Wade. In some classical ligand-based 3D-QSAR models, the protein structure is “thrown away” after aligning ligands.
To make the most of the information, COMBINE generates energy-minimized structures of a series of receptor-ligand complexes. In some cases, the interaction energy correlates with activity or binding affinity. Typically, however, it does not. So COMBINE partitions the interaction energy based on the type of energy and spatial distribution. The model weighs the relevant terms of the interaction energy to describe the different activities for the set of ligand-receptor complexes. Wade says the procedure yields predictive QSARs as well as mechanistic insights useful for ligand and receptor design.
Richard Cramer III, PhD, chief scientific officer at Tripos Inc., St. Louis, says his company is developing “several different lines” to further enhance the productivity of 3D-QSAR. For example, ChemSpace uses “topomer” (shape-based) fragments to search virtual libraries for compounds similar to a given lead. The approach locates structurally distinct compounds whose chemistry is unrelated to the original lead. These compounds can then be assessed using 3D-QSAR and other approaches.
PredictionBase from IDBS generates fragments that retain the parent structure’s stereo configurations. “Fragmentation is based on the rule of additivity of chemical properties and offers many advantages in comparison with descriptor-based methods,” Kolossov says. The program simultaneously accounts for all known biological responses induced by a drug.
Method du jour psychology
Despite these advances, Stephen Bowlus, PhD, a consultant in molecular modeling and drug design with Seascape Learning, a software provider in Cupertino, Calif., emphasizes that older models still offer valuable insights. Bowlus criticizes many publications describing new methods for not acknowledging that a competing method may adequately describe the system. “This contributed to the method du jour psychology that permeates too much of the computational chemistry/molecular modeling arena.”
Bowlus says older methods draw on a “richer variety of molecular and property descriptors” than higher-dimension models. This, he argues, may make them better suited for studies beyond those associated with receptor binding, such as formulations, pharmaceutics, and ADME. On the other hand, higher-dimension QSAR is more automatable, which increases efficiency and allows the processing of larger data sets. “I have found all methods of greater or lesser utility in different circumstances, but I have found no method that will always give a useable, predictive model.” Bowlus adds that modelers and chemists must develop more sophisticated views of QSAR’s statistical nature generally and when designing follow-on synthesis sets in particular. “It is only through this evolution that the industry will best marry the quantitative skills of the modeler with the intuition of the bench chemist.”
Desperately seeking validation
Validation is the hottest topic in QSAR research, Kolossov says, adding that the failure rate for models depends intimately on their quality. Yet many models are validated using “statistically insignificant” numbers of compounds. “We desperately need sufficiently large real datasets so that we can validate our QSAR models. Current literature is mostly based on very small real datasets and/or simulated data,” says Ersin Bayram, PhD, a biomedical engineer at Wake Forest University, Winston-Salem, N.C.
Concerns over intellectual property hinder these attempts. Bayram believes, however, that such concerns can be overcome. “Public databases should have a filter mechanism to anonymize the data. Compound names can be stripped and only the descriptor values along with the bioactivity values would be made public.” In some ways, intellectual property protectionism reflects the fact that QSAR has come a long way. “Each step of the way has been paved with powerful mathematical tools designed to maximize the potential of the model to predict the activities or properties of novel molecules,” Doweyko says. As in the past, the next steps will be defined by an unexpected combination of imagination and insight.”
Filed Under: Drug Discovery