
[Image created with Sora]
Multiple independent studies, including from McKinsey, Bain, Capgemini, the Federal Bar Association, and the Federal Reserve of St. Louis , show first-wave generative-AI productivity lifts in roughly the low-double-digit to upper-30% band (and occasionally higher on narrow tasks). From a drug review perspective, such gains are more likely to translate into trimming several days, perhaps up to a few weeks in more optimistic case, from the internal review cycle of a New Drug Application (NDA) or generic drug (ANDA), rather than therapies suddenly hitting the market in a fraction of their current multi-month approval timelines (10 months for standard review and six months for priority).
In an announcement, Jinzhong (Jin) Liu, a deputy directory within CDER, noted the potential to perform scientific review tasks in “minutes that used to take three days.” Translating such benchmarks into across the board may face a few speed bumps. First, FDA has a relatively lean staff of late thanks to a wave of retirements and buyouts earlier this year. That could mean fewer mentors to guide rookies through AI-assisted workflows. Second, new tools bring new choke points: advisory-committee calendars, for instance, are already booked months in advance.
The rollout will have some dedicated staff muscle with a new chief AI officer (Jeremy Walsh) and CDER’s director of strategic programs (Sridhar Mantha) will oversee the agency-wide AI rollout.
Our sister site, Medical Design & Outsourcing, had more details on FDA’s plans.

CDER-only projection of annual productive hours from genAI (2025-2027). Millions of median hours saved are shown under Pessimistic, Base, and Optimistic scenarios. Note that 2025 values reflect an estimated 50% of full-year potential owing to the planned mid-year AI system rollout. [Source: Internal Drug Discovery & Development Monte Carlo model using a mix of public data and estimates]
But what about genAI errors?
Regulators and risk officers aren’t buying the “just trust the model” line anymore. Across pharma, finance, health-care delivery and aviation, they’re forcing generative-AI pilots to prove their homework, fence in bad outputs and log everything for later audits. Retrieval-augmented generation (RAG), knowledge graphs (and KG + RAG hybrids), human-in-the-loop reviews, and model-risk “gates” are the workhorses showing up frequently, while new guidance from FDA, OCC and Brussels ties compliance to concrete counter-hallucination controls.
Why hallucinations happen and the common technical fixes
LLMs try to predict the next token, not the next fact. When the training corpus is thin or the prompt is fuzzy, they invent. Mitigation recipes now look surprisingly uniform:
- Ground the model. Feed it only vetted internal docs via RAG; Microsoft’s enterprise playbook lists RAG as step 1 for any regulated rollout.
- Constrain generation. Low temperatures (adjustable in playgrounds and via the API), scoped prompts (“answer only from retrieved docs”), and metaprompts that force an “I don’t know” when evidence is missing.
- Layer safety filters. Content-safety APIs, role-based access and private-network deployments are now table stakes.
- Continuous eval loops. Automated regression tests plus a human escalation queue keep score on hallucination rates. In high-stakes medical contexts, doctors and medical experts play important roles in interpreting context.
CROs such as IQVIA now market “healthcare-grade” AI that blocks the model from answering outside the sponsor’s document vault, forcing citation links in every output.
FDA’s “Total Product Lifecycle Considerations for Generative AI-Enabled Devices” from its CDRH division provides some indication of its thinking on the subject. In the document, the agency notes that hallucinations can “introduce uncertainty in the device’s behavior, which can translate to difficulty in understanding the specific bounds of a device’s intended use.” This is especially concerning in healthcare, where “highly accurate, truthful information is critical.” Consequently, the FDA highlights that such characteristics make it challenging to demonstrate that a Generative AI-based product will provide “accurate, consistent, and reliable outputs.” In other words, while there are techniques in place to mitigate them, hallucinations are likely to persist at a low but non-zero rate with safety measures in place, requiring continuous monitoring and documented human oversight.
Although the agency didn’t immediately respond to a request for a comment, it is plausible that the agency could significantly transform an array of time-intensive regulatory tasks through generative AI. CDER could use AI can boil down hundreds of pages from a drug application or clinical-study report in minutes instead of days. The same tools could handle grunt work such as checking submissions for missing sections or cross-referencing data. Biostatisticians could use genAI tools to auto-generate draft analyses, speed SAS coding and sketch first-pass efficacy or safety readouts. Broader uses range from triaging adverse-event reports for epidemiologists to mining toxicology literature, all of which could accelerate safety reviews and research.
Even a circa 6% to 7% net productivity lift under base-case and optimistic assumptions for six key FDA centers (CDER, CBER, CDRH, HFP, CVM, and CTP) would free up more than 1 million staff hours annually from these groups alone. This is the equivalent of adding roughly 700 to more than 800 full-time staff to these centers’ technical roles without new hires.
‘Hundreds of thousands of CDER hours annually’ math note
The projection of hundreds of thousands of staff hours saved annually at the Center for Drug Evaluation and Research (CDER) stems from a scenario-based estimation model. Here’s a high-level look at the core assumptions:
CDER staffing (Mid-2025 Estimate): The model uses an estimated CDER staff count of roughly 4,900. This figure is based on analysis of FDA’s Q1 FY2025 onboard numbers (6,058 for CDER) followed by widely reported RIF impacts in April 2025, which sources like the Pink Sheet indicated reduced CDER’s headcount by over 1,000 to “under 5,000.” For context, the entire FDA had roughly 19,700 staff pre-RIF according to AgencyIQ; the loss of about 3,500 positions agency-wide suggests a current total FDA size of around 16,200.
Impacted technical roles within CDER: The model assumes generative AI tools will primarily affect specific technical roles, estimated for simplicity at approximately 60% of CDER’s staff. This includes key categories like medical reviewers, biostatisticians, software specialists, and other scientists. This aligns with earlier FDA statements that historically about two-thirds of its total staff are in scientific and technical roles.
The model then estimates the percentage of relevant CDER staff using new AI tools, proportion of their tasks where AI can be meaningfully applied, and net time saved on AI-assisted tasks.
CDER’s leadership in FDA’s AI pilots, reported successes (e.g., tasks reduced from “days to minutes”), and the “human-in-the-loop” operational context provide more context. The model uses statistical distributions (low, mode, high estimates) for these factors and runs them through thousands of Monte Carlo simulations.
Using an estimate of 1,700 productive hours per full-time equivalent (FTE) annually, the simulation projects median outcomes. For 2027, the “hundreds of thousands of hours saved” at CDER (specifically roughly 650,000 to 760,000 hours) and the associated net productivity lift of approximately 7.8% to 9.2% correspond to the Base and Optimistic scenarios for CDER in that year. This translates to an FTE equivalent of roughly 380 to 450 positions.
For context, a 5–8% net lift looks credible next to what other tightly regulated shops are already getting out of genAI. Goldman Sachs says its 12,000 coders are posting significant efficiency gains (circa 20%). On the survey side, Thomson Reuters’ 2025 Generative AI in Professional Services report pegs the average lawyer’s time dividend at roughly four hours a week, which comes to just over 200 hours a year. Using 200 as a numerator, that equates to a high single-digit or low-double-digit percentage productivity bump for many attorneys, depending on billable hours per year.
Filed Under: Data science, Regulatory affairs