Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Views
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

Data Snapshot

By Drug Discovery Trends Editor | May 18, 2010

info-1.jpg

click to enlarge

Figure 1. Overview of the system and workflow. Desired data were then retrieved and reformatted before loading into the analysis tool for visualization. Various graphs were generated with each iterative cycle of querying and re-querying until a final collection of graphs was generated. (All Figures: Genzyme Corporation)

The World Health Organization’s Global Burden of Disease statistics identify cancer as the second largest global cause of death, after cardiovascular disease.1 Global cancer deaths are projected to increase from 7.1 million in 2002 to 11.5 million in 2030.2 However, new advances in cancer prevention, diagnostics, and treatment mean that one third of cancers are preventable while another third are curable through early detection and effective therapy. These new cancer therapies are subject to vigorous research and trials, including the application of new high-throughput biomedical technologies that generate large amounts of data accessible in public online registries.

In 2008, Dana-Farber Cancer Institute’s Cancer Vaccine Center (CVC) initiated a research project to investigate the competitive landscape of the cancer vaccine field and to help shape its strategy in the marketplace. This required studying data from 645 cancer vaccine clinical trials and analyzing statistics on cancer types, incidence, and survival rates. At first glance, this appeared to be a very difficult, time consuming task. To analyze information from multiple data sources, understand the relationships underlying the data, and identify trends and patterns would have required significant IT resources using a traditional approach.

info-2.jpg

click to enlarge

Figure 2. Data mining framework. The TIBCO Spotfire interface to the data mining system has four sections: menu bar (top), query filter panel (right), database details on demand (bottom), and the main graphing area (central). 

However, using a visual analytics tool for data exploration and discovery, an approach was developed to rapidly extract complex cancer vaccines data from major clinical trial repositories. This application enables rapid analysis of information about institutions, clinical approaches, clinical trials dates, predominant cancer types in the trials, clinical opportunities, and pharmaceutical market coverage. Presentation of results is facilitated by visualization tools that summarize the landscape of ongoing and completed cancer vaccine trials. Summaries show the number of clinical vaccine trials per cancer type over time, by phase, by lead sponsors, as well as trial activity relative to cancer type, and survival data. From a single plot, cancers that are neglected in the vaccine field can be identified. The results were published in the journal Immunome Research.3

Analysis Workflow
The data mining system consists of a back-end XML database, a front-end visualization interface, and an analysis component. The analysis workflow is shown in Figure 1. First, XML files for relevant cancer vaccine trials were downloaded from the ClinicalTrials.gov Web site and incidence and survival facts were downloaded from the National Cancer Institute (NCI) Web site. A series of questions were defined to address using this system. Fields of interest contain information such as cancer type, phase of the trial, and recruiting status; these fields were extracted from the primary XML files. Additional fields of interest, such as technology platform, adjuvant usage, and therapy type, that provide information in a form suitable for database querying, were added manually and associated to each clinical trial record in Dana-Farber’s back-end database. These data were not available as separate fields in the ClinicalTrials.gov records, but could be derived from the descriptions and mapped.

info-3.jpg

click to enlarge

Figure 3. Clinical cancer vaccine trials conducted in the US during the last 30 years. Bars represent the total number of trials started for a particular year. The color code on each bar represents the phase of the trials (green: Phase 1; yellow: Phase 2; red: Phase 3; grey: unspecified).

Data visualization software was used to construct the environment for the Dana-Farber data mining application. The graphical user interface shown in Figure 2 facilitates graphing and tabulation through drag-and-drop actions.

Cancer vaccine trials data mining questions
The data mining application yielded answers to questions such as “How has the cancer vaccine field evolved in the last ten years?” and “How many cancer vaccine trials have been conducted and how many of them are currently open in the United States?” and provide a historical view of the field. Similarly, answers to questions like “What cancer types are currently researched in clinical trials?” and “What phase are these trials?” offered an up-to-date view of the cancer vaccine space. In addition, this application helped answer more specific questions such as “How many breast cancer vaccine trials have been conducted by Dana-Farber Cancer Institute’s Cancer Vaccine Center and what types of vaccines were used for those trials?”

The versatility of this system enables the analysis of various dimensions of the clinical trials landscape, including clinical trials by timeline (Figure 3), type of cancer, lead institution, trials by disease prevalence, and/or specific vaccine technology visualized through dynamically generated graphs.

Conclusion
By accessing comprehensive clinical trials information using next-generation software applications—like the Spotfire from TIBCO (Somerville, Mass.)—several mouse clicks provided access to knowledge that would otherwise require hiring of specialists or consultants. By combining public databases of clinical trials, data formatting by XML, and computational analysis and visualization, specific knowledge can be extracted rapidly from a large data set, summarized, and presented to the user. This data mining approach enabled rapid analysis of the hotspots of cancer vaccine activity and revealed hidden patterns, trends, and biases in the data. Summarization and visualization of these data represents a cost-effective means of making informed decisions about future cancer vaccine clinical trials.

About the Author
Vladimir Brusic’s earned a PhD from LaTrobe University, Australia, and BEng (Mech., Belgrade), MEng (Biomed, Belgrade), MAppSci (InfoTech, RMIT), and an MBA (Rutgers, NJ). He developed novel computational solutions for immunology and published more than 150 scientific articles and several biological databases. Xiaohong Cao received a PhD from Yale University and an MBA from Babson College. She is actively involved in genomic research in cancer and has developed many informatic solutions for research and business applications. 

References
1. Mathers CD, Loncar D: Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3(11):e442
2. Department of Measurement and Health Information Systems: World Health Statistics 2007. World Health Organization, Geneva, Switzerland; 2007.
3. Cao X, Maloney KB, Brusic V: Data mining of cancer vaccine trials: a bird’s-eye view. Immunome Res. 2008;4:7.

This article was published in Drug Discovery & Development magazine: Vol. 13, No. 4, May 2010, pp. 16-17.


Filed Under: Drug Discovery

 

Related Articles Read More >

Sai Life Sciences exec: GLP-1 boom has ‘exploded the peptide field’ as firm opens new center
Novartis in the Pharma 50
Swissmedic approves first malaria treatment for infants
Korean team reports all-in-one cancer nanomedicine in pre-clinical studies
Nektar’s Phase 2b atopic dermatitis win triggers 1,746% analyst target surge, but legal tussle with ex-partner Lilly could complicate path forward
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Views
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE