Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

Delivering Knowledge Through Effectively Managed Screening Data

By Drug Discovery Trends Editor | July 8, 2008

All too often, drug discovery organizations rely on ‘experts’ to make decisions while in many cases the data is readily available for anyone to use—if it is viewed and accessed in the right way.
Web Exclusive

A drug discovery organization’s knowledge is primarily derived from the internal data it generates. Using this available data to help direct its screening processes, an organization bases decisions such as the choice of compounds to test in primary screens on the quality of its data. If data quality is high, the knowledge derived from screening can be used to maximize the ability to find better leads. For example, previously observed information on activities from other targets for similar compounds or families can be used to more accurately assess any potential selectivity issues in a lead.

The ultimate goal of knowledge-based screening— the processing of extracting maximum knowledge from an organization’s data reserves and employing that know-how to support and improve research and business decisions—is to create virtual ‘experts’. Too often organizations rely on ‘experts’ to make decisions while in many cases the data is readily available for anyone to use—if viewed and accessed in the right way.

Hidden reservoirs of knowledge
To use this ‘hidden’ knowledge it needs to be captured, collected, analyzed, progressed through QA and QC processes, and stored in a way that is accessible and with context. There are a variety of ways in which companies may not sufficiently manage this mountain of unused data and so are unable to exploit it. Some organizations delete screening data once a project is finished, or, if this data is retained, it may only be in a summary form, with limited usefulness in terms of preserving context and ensuring data quality. Accessing data can also pose problems, especially if it is stored in disparate data sources, warehouses and silos across an organization. A consistent and coherent method of managing available data can eliminate these issues, allowing maximum value to be extracted from research information.

The value of data management
The need for an effective way to capture, analyze, QC, and report on every aspect of an experiment has grown significantly in the discovery industry in the last decade as a response to the increasingly large volumes of screening data generated by centralized screening, robotics and automation.

As data volumes increase, efficient data management becomes more and more essential to the screening process in order to efficiently store and process data in a way that organizations can make use of it.

Today’s data solutions
Today’s data management solutions have evolved to accommodate an integrated approach to data capture, storage, and analysis. Biological and chemical data, both factual and contextual, is stored in a central database. The use of robotics and automation has increased and infrastructure and hardware is greatly improved. Automated data capture direct from laboratory instruments maintains the integrity of raw data, reduces transcription errors, and provides 24/7 screening and exception handling.

Sophisticated data analysis software is now included in some modern data management solutions, enabling scientists to perform curve fitting and statistical calculations within the same environment as data capture. Quality control functionality provides data visualization and configurable business rules that can flag potential errors and automatically knock out erroneous results, bringing only spurious data to the screener’s attention. 

Today, screening has adopted a production line approach, taking advantage of both Lean and Six Sigma methodologies to address workflow efficiency and improve predictability (cycle time). Screening organizations focus on low unit cost, a high standard of data quality and effective lead profile definition. Rather than being regarded as a method to find a drug instantaneously, the screening process identifies a lead series of potent molecules that conform to a predesigned drug profile with required lead candidate properties. Once identified, the lead series molecules are manipulated to promote selectivity, where a drug is active on the desired target receptor but not on others, so avoiding side effects.

For example, a molecule may show efficacy on a target receptor but may need manipulating to change an aspect of its behavior, such as the ability to dissolve, or a need to control a side effect, without affecting its potency. Looking at past experimental data, scientists can gain an insight into the potential behavior of a lead, re-using available data to avoid unnecessary research effort and help to identify potential successes.

 

Fig 1: Typical lead generation process 
click to enlarge 
 
Fig 1: Typical lead generation process

Screening, as part of this lead generation process, is a multiple group discipline, involving several separate systems and/or tools to build and combine point solutions. Integrating this data often involves retrieval from a number of data silos.

Silos of data
First generation integration solutions that centered on the concept of local repositories were unscalable and costly to maintain, and therefore had limited applicability. Organizations lacked a coherent and efficient way to access, correlate and integrate information that was scattered in separate and remote data silos such as databases, data marts and warehouses and sought a single point or ‘portal’ from which to access and search all available data sources.

For example, Fig 2 shows a typical ‘unmanaged’ workflow where several databases or silos are involved in the screening process.

 

Fig 2: Typical workflow involving several data silos 
click to enlarge 
 
Fig 2: Typical workflow involving several data silos

Fig 2: Typical workflow involving several data silos

Storing data in silos hinders access to and re-use of an organization’s data and knowledge. Fostering work in isolation rather than enabling cross-departmental communication, silo data storage involves transfer of information through a number of different systems and back again in order to perform a sequential process, as shown in Fig 2.

Using separate applications and databases, a chemist selects or designs compound libraries relevant to a particular target for screening.

A file is sent to a separate compound store database to physically generate the plate.

The plate is screened and results sent to a screening database.

The screening data is retrieved by chemists who use a variety of applications to analyze the structures and store the data in a structure analysis database.

The structural analysis is returned to the library design database, where the compounds are modified based on the analysis or to more fully conform to a drug profile. A new library of compounds is created and again sent to the compound store database to build a new plate.

Screening is performed on the modified compound library and results sent to the screening database.

If necessary, further structural analysis is performed, saved in the structure analysis database, and returned to the library design database, until the desired drug is created.

During this process through fragmented data sources, information can lose context and links to associated data, making its quality unreliable. ETL (Extract, Transform and Load) tools, employed when data is imported into each silo, involve data cleaning via rules that may be inconsistent between each silo, meaning data can be lost and irretrievable. Archiving rules may also lack consistency if data is removed or modified upstream.

Evolving data management
Gradually data management has evolved from the use of these several disparate data silos to a more centralized system where data is stored in and accessible from a single location. Fig 3 shows an example of a data management system that is based around a central results database. Information from supporting databases is integrated into one searchable location, significantly improving data access, querying and analysis.

 

Fig 3: One single point for data access and querying  
click to enlarge 
 
Fig 3: One single point for data access and querying
 

Data management infrastructures like the one shown above are now surmounting obstacles to knowledge-based screening by offering a better level of integration, the components of which can also integrate into existing information architecture in a multi-vendor environment. These solutions allow storage of both biological and chemical data centrally, allowing easy access to and querying of a compound’s data from screening to candidate submission. Multiple groups can share and use current and previous unambiguous project data with full experimental context, streamlining workflows and promoting communication and collaborations across an organization.

This approach has similarities to dimensional modelling as used for data warehousing. The results database becomes a ‘single point of truth’, allowing integration to supporting databases into one central location via connecting applications. This ‘single point of truth’ can be integrated with similar central databases or used to populate marts/warehouses, enabling the use of service-oriented architecture (SOA) processes and data federation tools to integrate data with external applications and sources. A centralized data management system offers a host of opportunities for the exploitation of data. Organizations have access to their own content database, which in some instances may contain more than two billion results. ETL tools with flexible open business rules can be employed to ensure quality and contextual richness, while analysis and curve-fitting tools apply consistency to results, so that like is compared to like.

Providing a potential database for building predictive models, a unified data management system allows research information to be re-used for decision support functions. Such a platform could accommodate predictive technology that alerts organizations to potential future issues with compounds and enable further chemistry analysis that helps to develop a lead series or analyze trends. Applying trend analysis algorithms widely used in other industries, such as SVM, MLR, PLS, PCA, Random Forest, and Consensus, helps to detect patterns and extract knowledge from a large amount of centrally stored data.

Most knowledge is derived from people knowing what they are looking for, such as queries retrieved from Internet search engines and selected content delivered via Web-based RSS (Really Simple Syndication) feeds. Efficient knowledge-based screening allows data to be used proactively, for example, searching for and identifying past failures and successes in studies and using that knowledge to drive current research and avoid similar costly failures. 

By efficiently handling significantly increased screening throughput and encouraging data sharing across the whole discovery research community, knowledge-based screening boosts productivity and streamlines workflow. Research data can be exploited to the full, retaining its value over time and enabling re-use for data mining, predictive technology and decision support. Providing a base for informed and intelligent insights, data management enables organizations to anticipate and avoid future problems more efficiently. 

www.idbs.com

About the Author
Glyn joined IDBS in 1995. IDBS is a leading software company specializing in integrated biological and chemical data management for  discovery research. With over 10 years experience in Drug Discovery IT, Glyn has extensive expertise including project and product management. At IDBS he has worked on primarily large projects with many of the major pharmaceutical and biotechnology companies. Prior to IDBS he worked in chemistry including Shell Exploration and Production (Shell EP). Glyn has a BSc Hons in Combined Studies from Manchester University and an MSc in Applied Computing.


Filed Under: Drug Discovery

 

Related Articles Read More >

Lokavant’s Spectrum v15 uses AI to cut trial-feasibility modeling from weeks to minutes
Prime time for peptide-based drug discovery 
Why smaller, simpler molecular glues are gaining attention in drug discovery
Glass vial, pipette and woman scientist in laboratory for medical study, research or experiment. Test tube, dropper and professional female person with chemical liquid for pharmaceutical innovation
Unlocking ‘bench-to-bedside’ discoveries requires better data sharing and collaboration
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE