In any area of disease research, a deep understanding of recent and future trends surrounding a particular condition is crucial to the drug discovery process. But with the volume of scientific literature increasing all the time, it is difficult to manually sift through all the existing information and correlate data in such a way to produce meaningful direction. This predicament can lead to the misallocation of resources on research in areas that are less likely to yield promising treatments.
By analyzing all literature related to a specific condition or disease, researchers can better identify which areas will likely lead to a breakthrough. Natural language processing (NLP) uses a combination of linguistics, artificial intelligence, and computer science to understand text in the same way as people. Researchers can use NLP in trend analysis to determine the rate at which certain terms appear in literature over time. This affords improved visibility of which fields of study yield the most progress and provides guidance on which areas future research efforts should focus.
Case study: Using NLP to review pancreatitis-related literature
One example of this approach in action is a recent analysis of emerging trends in pancreatitis-related literature. Acute pancreatitis (AP) is the most frequent gastrointestinal-related cause of hospitalization in the U.S. If unresolved, AP can progress to chronic pancreatitis (CP), a condition with a mortality rate four times greater than that of the general population. Although pancreatitis is a serious and widespread illness, no disease-modifying drugs are available to treat the condition. By identifying the proteins and genes linked to pancreatitis, researchers have a better chance of understanding the mechanism of disease and discovering new potential treatments.
NLP was used to review all pancreatitis-related literature over the past three years to discover the most promising terms in the taxonomy categories of “Proteins,” “Genomic Elements” and “Biological Functions.” This was followed by a literature analysis on those terms for the past five years to show the historical trend. When these historical publication counts are visualized, terms that appear with increasing frequency over time are considered emerging or trending terms.
Key literature review findings: The pancreatitis literature review identified several terms that fit into the two taxonomy categories. In the Proteins and Genomic Elements category, NLRP3, miR-155, and heme oxygenase (HO-1) were identified as trending (fig.1).
- NLRP3: Inflammasomes, including NLRP3, have been implicated in multiple inflammatory disorders. NLRP3 inhibition has been shown to improve pancreatitis biomarkers in mice. Moreover, several drugs are in early clinical trials targeting the NLRP3 inflammasome for other inflammatory diseases. These include Inzolemid from Inflazome, DFV890 from Novartis, NT-1067 from NodThera, and Dapansutrile from Olatec.
- miR-155: miRNAs are involved in the post-transcriptional regulation of gene expression. MiR-155 is a miRNA involved in the inflammatory response and plays a role in AP development. One study found a relationship between miR-155 and impaired autophagy which has a demonstrated relation with AP in animal models. Another study found that miR-155 could be a therapeutic target or a useful biomarker for the disease.
- Heme oxygenase (HO-1): HO-1 is an enzyme that is markedly expressed during an inflammatory response. Studies suggest there is potential to assess inflammation and tissue damage by measuring levels of HO-1. A Chinese herbal medicine approved for heart failure in China was found to upregulate HO-1 in rats with induced severe acute pancreatitis. The treated rats showed protection against inflammatory organ damage.
Trending terms identified in the Biological Functions taxonomy category were neutrophil extracellular trap (NET) formation and polarization (fig. 2).
- Neutrophil extracellular trap (NET) formation: Neutrophils are immune system cells that play a key role in inflammation, including in the inflammatory processes of pancreatitis. NET formation is one mechanism by which neutrophils fight infection. Recent studies have discussed targeting NETs as part of treatment for severe acute pancreatitis (SAP).
- Polarization: Macrophages are white blood cells that can stimulate the immune system but also possess anti-inflammatory properties. Macrophages produce distinct subtypes by a process called polarization. Studies show a link between the inhibition of macrophage polarization and the inhibition of MMP-9 (matrix metalloproteinase-9), a potential target for treating pancreatitis.
Each of the five terms identified as trending has been validated as a potential target by demonstrating positive impacts on pancreatitis biomarkers in vivo animal models. There are also several compounds engaged in clinical trials.
By identifying terms found with increasing frequency in the literature, researchers can identify emerging research areas. Advances in NLP make it possible to quickly analyze vast volumes of literature related to a specific condition. As shown in the case of pancreatitis, NLP identified five terms that have appeared in pancreatitis-related literature at an increasing rate over the past five years. This provides sound insight into which areas have begun to attract the attention of researchers.
The same concept can be applied to other conditions, allowing researchers to refocus efforts on the most promising lines of study. Leveraging data analysis tools such as NLP can pave the way for faster drug discovery and help address the urgent needs of patients suffering from currently untreatable conditions.
Eric Gilbert, Ph.D., Elsevier is a consultant, life sciences at Elsevier.
Filed Under: Data science, machine learning and AI