The life sciences industry is abuzz with the potential of generative AI, but its application in the highly regulated pharmaceutical sector faces challenges. As Jane Lomax, Ph.D., head of ontologies at Elsevier’s SciBite subsidiary notes, “Everyone across the whole industry is experimenting with it. But no one knows for sure yet how best to use it,” Lomax said.
When generative AI first entered the mainstream, Lomax recalls speaking with a pharma executive who said the tech “‘had the shortest hype cycle ever.'” “Everyone was initially amazed, but then they realized, ‘Oh no, we can’t use it because of data privacy and copyright issues — all of those things,” Lomax said.

[Image courtesy of SciBite]
SciBite Chat focuses on grounding genAI with a structured data approach

Jane Lomax, Ph.D.
Enter SciBite Chat, a genAI-powered search tool that is specifically for life science researchers that aims to streamline research and data extraction in the biomedical field. “We combined the best of the LLM, the generative AI with our structured data,” Lomax said. “You get the ability to ask a natural language question, but you get the explainability of being able to use that structured data behind the scenes. So you get the best of both worlds.”
SciBite Chat combines semantic search and Large Language Models (LLMs) to interpret natural language queries using ontology-backed semantics and a Retrieval Augmented Generation (RAG) architecture. The former uses structured vocabularies to understand and contextualize queries, while the latter enhances response accuracy by integrating retrieved information with generative capabilities. The approach is designed to ensure search results are relevant and grounded in domain expert knowledge.
Minimizing hallucinations while ensuring transparency
SciBite Chat is designed to eliminate hallucinations by offering transparency into the source data used to generate responses. For instance, when SciBite Chat answers a question, the software highlights the relevant sentences in the reference documents, allowing users to identify the information’s origin. This is made possible through SciBite’s semantic enrichment technology called TERMite, a named entity recognition engine that annotates unstructured natural language content by identifying and extracting scientific terms like genes, drugs, and diseases.
It is by grounding the LLM in the “human truth” provided by ontologies that SciBite Chat can minimize the risk of hallucinations or unreliable results, a common concern with generative AI. “You can sort of ground your LLM in this sort of human truth, which is the ontology,” Lomax said.

[Image courtesy of SciBite]
A simple interface for scientific genAI search
SciBite Chat itself has a simple, Google-style interface with a single box in the middle. “It just says, ‘Ask your question,’ and you can see which sources you’ve got at the top. So you say, ‘Give me the top five targets for this indication.’ And it will translate your question into a structured query using SSQL.”
“It will transform the bits of your question, taking out elements like genes or diseases, and turn them into ontology terms. Then it creates a structured query, sends that to all its data sources, and pulls the top hits,” Lomax said. “It extracts the relevant paragraphs and feeds all that back to the LLM, which synthesizes the information and provides a natural language response, like ‘This is the answer to your question. These are the references, these are the top 10 references.'”
Users can then ask follow-up questions, such as “Which of these has publications from the last five years?” SciBite Chat will go back, answer the question, and allow users to interact with the data as if they were conversing with another human. “It saves you time and gives you the flexibility to pull all that data together,” Lomax explained. “You haven’t got to go to separate places and sort of pull it all together yourself.”
Combining data sources while grounding genAI
SciBite Chat is not just limited to Elsevier’s database. “Pharma can put their own data in there as well,” Lomax said. “They can combine that with their own data, with public data, they can pull it all in and then be able to interact with it via the LLM.”
SciBite Chat uses ontologies to provide structure and context to the underlying data, ensuring that the AI-generated responses are grounded in reality. “So an ontology gives you a way of sort of pulling those things together… So you know that in this dataset, that mouse means this, but in this one, ‘Mus musculus‘ also means this, and so you can sort of get them to talk to each other that way,” Lomax explained. This is vital in the conservative pharmaceutical industry, where explainability and traceability are paramount. “In an industry like pharma, you can’t just go and ask a [conventional] LLM a question, right?,” Lomax said. “You need the explainability. You need the provenance of that answer.”
SciBite designed SciBite Chat to be modular. While the search now uses OpenAI’s GPT-4, allowing users to enter their API key, it can take advantage of other models as well. “If a better model comes along, you can put that one in,” Lomax said. Or users can run tests to see which is stronger. To evaluate the performance of new models, SciBite is developing “golden question-answer sets” — that is, a collection of carefully curated questions and their corresponding correct answers, that will allow users to compare and assess how well each model performs. As Lomax said, they can ask “‘Okay, how does this one compare to GPT? Okay, this one’s half the price. Let’s use that.'”
Filed Under: Data science, Drug Discovery, machine learning and AI