Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

SciBite Chat: Elsevier’s answer to ChatGPT for life science researchers, minus the hallucinations

By Brian Buntz | May 20, 2024

The life sciences industry is abuzz with the potential of generative AI, but its application in the highly regulated pharmaceutical sector faces challenges. As Jane Lomax, Ph.D., head of ontologies at Elsevier’s SciBite subsidiary notes, “Everyone across the whole industry is experimenting with it. But no one knows for sure yet how best to use it,” Lomax said.

When generative AI first entered the mainstream, Lomax recalls speaking with a pharma executive who said the tech “‘had the shortest hype cycle ever.'” “Everyone was initially amazed, but then they realized, ‘Oh no, we can’t use it because of data privacy and copyright issues — all of those things,” Lomax said.

SciBite Chat

[Image courtesy of SciBite]

SciBite Chat focuses on grounding genAI with a structured data approach

Jane Lomax

Jane Lomax, Ph.D.

Enter SciBite Chat, a genAI-powered search tool that is specifically for life science researchers that aims to streamline research and data extraction in the biomedical field. “We combined the best of the LLM, the generative AI with our structured data,” Lomax said. “You get the ability to ask a natural language question, but you get the explainability of being able to use that structured data behind the scenes. So you get the best of both worlds.”

SciBite Chat combines semantic search and Large Language Models (LLMs) to interpret natural language queries using ontology-backed semantics and a Retrieval Augmented Generation (RAG) architecture. The former uses structured vocabularies to understand and contextualize queries, while the latter enhances response accuracy by integrating retrieved information with generative capabilities. The approach is designed to ensure search results are relevant and grounded in domain expert knowledge.

Minimizing hallucinations while ensuring transparency

SciBite Chat is designed to eliminate hallucinations by offering transparency into the source data used to generate responses. For instance, when SciBite Chat answers a question, the software highlights the relevant sentences in the reference documents, allowing users to identify the information’s origin. This is made possible through SciBite’s semantic enrichment technology called TERMite, a named entity recognition engine that annotates unstructured natural language content by identifying and extracting scientific terms like genes, drugs, and diseases.

It is by grounding the LLM in the “human truth” provided by ontologies that SciBite Chat can minimize the risk of hallucinations or unreliable results, a common concern with generative AI. “You can sort of ground your LLM in this sort of human truth, which is the ontology,” Lomax said.

SciBite datasheet image

[Image courtesy of SciBite]

To optimize for accuracy, SciBite Chat uses a structured query language called SSQL (SciBite Semantic Query Language) to transform natural language questions into structured queries. “If you want to have good AI, you have to have good quality data that underpins it,” Lomax said. “And so that’s what we’ve tried to overcome with our system, is that we help build the quality data first. So that’s the foundation and then you can have the successful AI that sort of sits as a layer on top of that.”

A simple interface for scientific genAI search

SciBite Chat itself has a simple, Google-style interface with a single box in the middle. “It just says, ‘Ask your question,’ and you can see which sources you’ve got at the top. So you say, ‘Give me the top five targets for this indication.’ And it will translate your question into a structured query using SSQL.”

“It will transform the bits of your question, taking out elements like genes or diseases, and turn them into ontology terms. Then it creates a structured query, sends that to all its data sources, and pulls the top hits,” Lomax said. “It extracts the relevant paragraphs and feeds all that back to the LLM, which synthesizes the information and provides a natural language response, like ‘This is the answer to your question. These are the references, these are the top 10 references.'”

Users can then ask follow-up questions, such as “Which of these has publications from the last five years?” SciBite Chat will go back, answer the question, and allow users to interact with the data as if they were conversing with another human. “It saves you time and gives you the flexibility to pull all that data together,” Lomax explained. “You haven’t got to go to separate places and sort of pull it all together yourself.”

Combining data sources while grounding genAI

SciBite Chat is not just limited to Elsevier’s database. “Pharma can put their own data in there as well,” Lomax said. “They can combine that with their own data, with public data, they can pull it all in and then be able to interact with it via the LLM.”

SciBite Chat uses ontologies to provide structure and context to the underlying data, ensuring that the AI-generated responses are grounded in reality. “So an ontology gives you a way of sort of pulling those things together… So you know that in this dataset, that mouse means this, but in this one, ‘Mus musculus‘ also means this, and so you can sort of get them to talk to each other that way,” Lomax explained. This is vital in the conservative pharmaceutical industry, where explainability and traceability are paramount. “In an industry like pharma, you can’t just go and ask a [conventional] LLM a question, right?,” Lomax said. “You need the explainability. You need the provenance of that answer.”

SciBite designed SciBite Chat to be modular. While the search now uses OpenAI’s GPT-4, allowing users to enter their API key, it can take advantage of other models as well. “If a better model comes along, you can put that one in,” Lomax said. Or users can run tests to see which is stronger. To evaluate the performance of new models, SciBite is developing “golden question-answer sets” — that is, a collection of carefully curated questions and their corresponding correct answers, that will allow users to compare and assess how well each model performs. As Lomax said, they can ask “‘Okay, how does this one compare to GPT? Okay, this one’s half the price. Let’s use that.'”


Filed Under: Data science, Drug Discovery, machine learning and AI
Tagged With: drug discovery, Elsevier, generative AI, ontologies, pharma AI, SciBite, semantic search
 

About The Author

Brian Buntz

As the pharma and biotech editor at WTWH Media, Brian has almost two decades of experience in B2B media, with a focus on healthcare and technology. While he has long maintained a keen interest in AI, more recently Brian has made making data analysis a central focus, and is exploring tools ranging from NLP and clustering to predictive analytics.

Throughout his 18-year tenure, Brian has covered an array of life science topics, including clinical trials, medical devices, and drug discovery and development. Prior to WTWH, he held the title of content director at Informa, where he focused on topics such as connected devices, cybersecurity, AI and Industry 4.0. A dedicated decade at UBM saw Brian providing in-depth coverage of the medical device sector. Engage with Brian on LinkedIn or drop him an email at bbuntz@wtwhmedia.com.

Related Articles Read More >

Zoliflodacin wins FDA nod for treatment of gonorrhea
FDA approved ENFLONSIA for the prevention of RSV in Infants
First clinical study results of Dupixent for atopic dermatitis in patients with darker skin tones 
Labcorp widens precision oncology toolkit, aims to speed drug-trial enrollment
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE