In the Game - Drug Discovery and Development

Systems biology is becoming increasingly more integrated into drug discovery and development processes, but is it ready to make a game-winning play?

Systems biology is the integration of high-content and high-throughput bioassay (‘omics’) data to generate predictive models of biological functions and processes. Models of human disease biology and pharmacology that are more predictive of safety and efficacy in humans have the potential to dramatically improve pharmaceutical research productivity.

While dynamic computational models that include all molecular constituents and their interactions across time scales relevant to clinical settings remain a distant goal, advances in systems biology technologies and innovative applications for compound discovery and development are making a measurable impact on drug discovery today. New target identification and compound validation continue to be a key focus of systems biology research with increased emphasis on model systems that are more relevant to human disease biology. Improvements in data-driven modeling and analysis approaches are being applied to compound discrimination and mechanism-of-action identification from cell-based and high-content screens, leading to increased acceptance and use of these assays in primary screening and lead optimization. Systems biology-driven advances in biomarker discovery and validation are improving the utility of early clinical trials and the economics of translational research. These applications highlight the current impact of systems biology on drug discovery research, while new predictive models for drug efficacy and safety are just beginning to come on line. In the near future, drug discovery may be able to count on systems biology for a game-winning play.

Target identification
The initial attraction of generating comprehensive, global data sets of cellular changes induced by drug or genetic modulation was the potential to detect any bioactive agent, regardless of its mechanism of action. The utility of a lead compound or gene could then be inferred by comparison to reference data sets, typically with classifier-based methods using specific data subsets, often quite small. In drug discovery applications, compound and genetic libraries have been screened in this way, with leads and potential targets identified by similarity to a data signature of interest generated, for example, from an overexpressed gene or knockout. Whereas new targets identified by these approaches are all too frequently considered “not druggable”, falling into difficult target classes for chemistry approaches (Haberman, 2005), direct applications for lead discovery are having more success. Avalon has described the use of this method to identify novel cancer therapeutics targeting the ?ß-catenin pathway (Bol, 2006) and Stegmaier et al. (2007) applied gene expression signature screening to identify inhibitors of the Ewing sarcoma oncoprotein, EWS/FLI. Both of these examples address targets and pathways that have eluded traditional small molecule and target-based approaches.

What is systems biology?
The goal of systems biology is to understand how molecular and cellular components and their interactions give rise to function and behavior. While this objective is theoretically shared by all researchers in the biological sciences, systems biology has more specifically encompassed the development of high-throughput measurement technologies and generation of high-throughput bioassay data, methods for the parallel analysis of the resulting molecular and cellular constituent data sets, and approaches for the construction and testing of computational or predictive models. As the ability to generate large data sets has been commoditized, in part through the pioneering efforts of Leroy Hood, among many others, to advance “Big Biology” (Carmichael, 2005), the challenge has shifted to the integration and analysis of these data to answer key scientific questions.

The large data sets that comprise systems biology outputs are by themselves long lists of disconnected observations that bring confusion more than understanding (Janes, 2006). Converting data to knowledge requires reducing data complexity and connecting information. Data mining and data-driven modeling approaches to reduce data complexity include classification methods for data integration and condensation such as cluster analysis and projection methods including multidimensional scaling, principal components analysis, and self-organizing maps. Other techniques, such as partial least squares, Bayesian and Boolean networks have been employed for developing classifier-based and causal models for data prediction (Berg, 2005; Ambesi-Impiombato, 2006).

Computational solutions, however, can mask underlying problems. Good statistics require more observations (replicates) than measured variables and can be economically- prohibitive for large data sets. And despite technical advances in measurement accuracy and sensitivity, the inherent variability (noise) in biological data limits the effectiveness of many of these solutions. In addition to these technical considerations, one of the key strengths of systems biology—the ability to generate new hypotheses—can be a weakness from the drug discovery perspective. In many cases, data or models generate multiple hypotheses, each of which requires require further confirmation studies. Testing these hypotheses can be onerous and time-consuming, involving resources that are often limited.

Mechanism of action
Drug mechanism of action has been another fruitful area for systems biology research. A number of successes using classifiers based on experiments with known agents have been reported. Investigators at Rosetta Inpharmatics, Seattle, Wash., and Merck & Co., Inc., Whitehouse Station, N.J., have described a compendium approach to building classifiers and are applying the technique to predicting compound toxicity based on transcriptional profiles (Dai, 2006). More advanced methods such as influence networks and causal reasoning models have also been explored. Pratt et al. (2006) recently described their causal reasoning methodology for organizing gene expression changes induced by androgen receptor signaling into causal pathways linked to the activity of this receptor.

Hit-to-lead and lead optimization
Compared to other stages in the drug discovery process, hit-to-lead and lead optimization programs have been slower to embrace systems biology. While there is a need for new approaches in this process, drug leads are already optimized on multiple parameters (potency, solubility, metabolism, etc.), and the proposition of adding even more parameters with which to optimize is not an attractive one. While there is strong interest in utilizing systems biology techniques for candidate characterization, particularly with the goal of biomarker identification, at this stage, it may be too late to have significant impact. As systems biology-based approaches advance and can demonstrate significant predictive power for key optimization goals (directly impacting safety and efficacy), the game plan will change.

Biomarker discovery
Excitement continues to be high for systems biology in biomarker discovery (van der Greef, 2007). Biomarkers can support disease diagnosis and monitoring, patient and drug selection as well as dosing. The increasing number of clinical failures is fueling the need for new approaches, as even modest improvements at the clinical stage can have a big financial impact. Biomarker discovery has benefited from improvements in the sensitivity and performance of expression profiling, proteomics, and metabolomics technologies. Methods to assess the activities of signaling pathways, such as reverse phase protein microarrays, in addition to providing biomarker signatures with which to stratify patients, are providing a new systems-level understanding of cancer biology (Speer, 2007).

Integrating expert knowledge
For solving problems of interest to drug discovery research, there is an increasing recognition of the importance of expert knowledge to define and appropriately formulate experiments. The biological systems explored and the timeframes of interest must be relevant to the question. Within pharmaceutical companies, this usually means the involvement of therapeutic area scientists who provide the biological context, guide the exploration of the resulting models, and perform the follow-up hypothesis testing. In the current resource-constrained environment of the pharmaceutical industry, this strategy has become difficult to justify.

For this reason, creative approaches that incorporate expert knowledge or biological domain expertise for reducing data complexity and providing focus for drug discovery problems are finding traction. These include methods to integrate various types of metadata, such as target knowledge or pathway information, into data analysis, as well as approaches that integrate disease and therapeutic area information into assay design upfront. There is increased interest in pathway analysis software that incorporates literature information and other data developed by companies such as Ariadne, Rockville, Md., GeneGo, Inc., St. Joseph, Mo., and Ingenuity Systems, Redwood, Calif., (Yurvey, 2006; Russell, 2006), as well as related academic efforts, such as the Cytoscape (Cline, 2007). In the virtual disease models of Entelos Inc., Foster City, Calif., drug discovery researchers are able to take advantage of computational models built with literature and other data sources for predicting in vivo drug effects (Gadkar, 2007).

Cell systems biology
Cell systems biology is another approach to incorporate biological domain expertise, not at the data analysis stage, but rather upfront into the design of the biological assay systems. In BioSeek’s BioMAP systems, this is accomplished by limiting data endpoints to sets of highly annotated clinical biomarkers and metabolites, and by designing the primary human cell-based assays around previously validated pathways and drug responses within specific therapeutic areas (Kunkel, 2004).

There are practical advantages to this approach. Compact data sets are easier to control for quality and statistical rigor, and are also more amenable to data exploration and model development. Assay formats are also suitable for hit and lead prioritization, for example, to support phenotypic hit prioritization from siRNA knockdown screens. The focus on disease biomarkers also facilitates correlation of data outputs with clinical activities. The ability of this approach to detect and discriminate a broad range of disease-relevant targets and pathways, including toxic agents, suggests that this is a useful approach for mechanism-of-action determination and compound differentiation, thereby bridging the gap between molecular discovery and human disease biology (Berg, 2006).

The success of this approach and other examples of mechanism of action inference from multi-parameter data sets are inspiring greater interest in employing cell-based assays, including high-content formats, in the discovery stage. Using factor analysis on high-content screening data from HeLa cells profiled with a compound library, Young et al. (2008) were able to fully describe the biological responses and infer mechanism of action. Improved assay designs in combination with computational advances that can provide early mechanism identification will provide the incentive to discovery researchers to supplement the current target-based approaches for lead generation and lead optimization with these approaches. Furthermore, such methods will be indispensable for multi-target or network-based drug discovery that is gaining momentum from research in cancer and anti-inflammatory therapeutics (Butcher, 2005; Ambesi-Impiombato, 2006).

Conclusions
Systems biology, in its numerous and varied manifestations, is contributing to pharmaceutical drug research at all stages of discovery and development. Pharmaceutical research productivity will improve as predictive models that are practical, relevant, and produce decision-making results materialize from these efforts. While the number of applications continues to grow, systems biology is transitioning from a separate science to an integral component of pharmaceutical research. In the drug discovery game, systems biology has been called up to play.

About the Author
Dr. Berg has led the development of BioSeek, Inc.’s BioMAP cell systems biology platform and research on predictive models of drug safety and efficacy. She has numerous publications on inflammatory disease and drug mechanisms.

This article was published in Drug Discovery & Development magazine: Vol. 11, No. 2, February, 2008, pp. 38-41.

References
Ambesi-Impiombato A, di Bernardo D. Computational biology and drug discovery: from single-target to network drugs. Current Bioinformatics, 2006, 1:3-13.

Berg, E.L., E.J. Kunkel, E. Hytopoulos, and I. Plavec. Characterization of compound mechanisms and secondary activities by BioMAP analysis. J. Pharm. Tox., 2006, 53:67-74.

Berg EL, Hytopoulos E, Plavec I, Kunkel EJ. Approaches to the analysis of cell signaling networks and their application in drug discovery. Curr Opin Drug Discov Devel. 2005, 8:107-14.

Bol, D., R. Ebner. Gene expression profiling in the discovery, optimization and development of novel drugs: one universal screening platform. Pharmacogenomics, 2006, 7:227-235.

Butcher EC. Entelos Can cell systems biology rescue drug discovery? Nat Rev Drug Discov. 2005, 4:461-7.

Carmichael, M. Your health in the 21st century. The shape of things to come. Newsweek 2005, 145:40-42, 44-45.

Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007, 2:2366-82.

Dai X, He YD, Dai H, Lum PY, Roberts CJ, Waring JF, Ulrich RG. Development of an approach for ab initio estimation of compound-induced liver injury based on global gene transcriptional profiles. Genome Inform. 2006, 17:77-88.

Gadkar KG, Shoda LK, Kreuwel HT, Ramanujan S, Zheng Y, Whiting CC, Young DL. Dosing and timing effects of anti-CD40L therapy: predictions from a mathematical model of type 1 diabetes. Ann N Y Acad Sci. 2007, 1103:63-8.

Haberman, A. Strategies to move beyond target validation. Gen Eng News. 2005, 25:36.

Janes KA, Yaffe MB. Data-driven modelling of signal-transduction networks. Nat Rev Mol Cell Biol. 2006, 7:820-8.

Kunkel EJ, Dea M, Ebens A, Hytopoulos E, Melrose J, Nguyen D, Ota KS, Plavec I, Wang Y, Watson SR, Butcher EC, Berg EL. An integrative biology approach for analysis of drug action in models of human vascular inflammation. FASEB J., 2004, 18:1279-81.

Pratt D, Hahn W, Matthews A, Febbo P, Berger R, Duckworth B, Levy J, Segaran T, Sun J, Ladd B, Elliston K. Computational causal reasoning models of mechanisms of androgen stimulation in prostate cancer. Conf Proc IEEE Eng Med Biol Soc. 2006, 1:38-9.

Russell, J. Pathway Pioneers. BioIT World 2006, 6:24-29.

Speer R, Wulfkuhle J, Espina V, Aurajo R, Edmiston KH, Liotta LA, Petricoin EF 3rd. Development of reverse phase protein microarrays for clinical applications and patient-tailored therapy. Cancer Genomics Proteomics. 2007, 4:157-64.

Stegmaier K, Wong JS, Ross KN, Chow KT, Peck D, Wright RD, Lessnick SL, Kung AL, Golub TR. Signature-based small molecule screening identifies cytosine arabinoside as an EWS/FLI modulator in Ewing sarcoma. PLoS Med. 2007, 4:e122.

van der Greef J, Martin S, Juhasz P, Adourian A, Plasterer T, Verheij ER, McBurney RN. The art and practice of systems biology in medicine: mapping patterns of relationships. J Proteome Res. 2007, 6:1540-59.

Young DW, Bender A, Hoyt J, McWhinnie E, Chirn GW, Tao CY, Tallarico JA, Labow M, Jenkins JL, Mitchison TJ, Feng Y.Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat Chem Biol. 2008, 4:59-68.

Yuryev A, Mulyukov Z, Kotelnikova E, Maslov S, Egorov S, Nikitin A, Daraselia N, Mazo I. Automatic pathway building in biological association networks. BMC Bioinformatics. 2006, 7:171.