Healthcare is one of the largest sources of “big data,” accounting for upwards of 30% of all data produced globally.1 Early on, the potential benefits of this data for healthcare outcomes were immediately evident. As that promise has begun to materialize, with the advent of AI and machine learning, real signals of improvement in therapeutic development and patient outcomes are increasingly evident in healthcare data.
In particular, big data has become critical for drug development, clinical trials, and, crucially, post-approval long-term follow-up (LTF). This has led to epochal movements in drug usage, such as the expansion of GLP-1 agonists into new indications like obesity and cardiovascular disease. The data driving this expansion came from the long-term follow-up of diabetes patients, enabled by technologies that capture vast amounts of real-world data (RWD).
Today, drug developers are increasingly incorporating RWD into their initial clinical trial planning. Surprisingly, this happens much less often in new vaccine development, even though long-term follow-up is essential for understanding patient population response, sustained efficacy, and potentially delayed adverse events.
This needs to change. LTF has long been mandated by regulatory bodies. To maximize the benefits of LTF and diminish the burden on clinical study resources, sponsors must incorporate best practices early in vaccine clinical trial designs. This includes early planning of LTF frameworks and protocols for patient tokenization. Here we discuss the benefits of this early planning and detail critical considerations for LTF protocols, including follow-up duration, population-specific concerns, and regulatory aspects.
Benefits of early-phase LTF planning
Establishing process frameworks for LTF in the earliest phases of clinical trials offers significant advantages across regulatory, operational, and scientific concerns. Early integration of LTF strategies using RWD and tokenization helps sponsors meet evolving regulatory expectations while bolstering public trust. These approaches also reduce the burden on trial sites and participants, particularly healthy volunteers, by minimizing the need for frequent clinical visits.
Operationally, early planning for LTF leads to substantial cost savings and efficiency gains across the product lifecycle. Tokenization enables passive data collection without launching new trials, easing logistical challenges. Early LTF planning enhances study design by reducing bias and improving data reliability. Additionally, access to rich, longitudinal datasets can provide deeper insights into cohort behavior and support adaptive trial designs that accelerate development and decision-making.
Early planning priorities
Establishing LTF architecture and protocols at the earliest stages of clinical trials begins with clearly defining research objectives and understanding stakeholder requirements, particularly those of regulatory bodies. Sponsors should consider integrating the possibility of an extension study into the parent trial protocol, allowing for early engagement with trial sites and smoother transitions if extended observation becomes necessary.
Framing informed consent early is another critical step. By disclosing long-term data collection plans and biospecimen storage from the outset, sponsors can significantly increase participant acceptance rates.2 Planning for tokenization in Phase I or II enables the linkage of trial participants to RWD sources without compromising privacy.2 While specific data can be analysed later, early tokenization ensures flexibility and continuity in data collection.
Sponsors can also validate patient-reported outcomes during the initial trial phases.2 Selecting “harder endpoints” like major surgeries, cancer recurrence, or death minimizes recall bias and helps ensure reliable data capture. These outcomes are typically well-documented in healthcare systems and are suitable for long-term observation.2
Patient tokenization
Patient tokenization is a foundational strategy for modern clinical research that offers a secure method to protect patient privacy while enabling robust data analysis. By replacing personally identifiable information with a unique, irreversible token, researchers can access and analyse health data across various platforms without compromising confidentiality. This approach allows for the integration of RWD—such as electronic health records, insurance claims, and genomic databases—to uncover patterns in disease progression, treatment efficacy, and healthcare utilization. The token serves as a consistent identifier across disparate data sources, facilitating a more complete and longitudinal view of patient outcomes.3,4
To maximize the benefits of tokenization, it is essential to implement it early in the clinical development process, ideally during Phase I/II trials. Early adoption ensures the necessary infrastructure is in place before larger-scale studies begin, allowing for smoother operations and cost efficiencies. It also enables the collection of comprehensive data from the outset, even before some specific research questions are defined.3,4 This flexibility is particularly valuable in adapting to evolving regulatory guidance, especially for sponsors who establish early consultative conversations with regulatory bodies. Moreover, early tokenization increases the likelihood of capturing a broader patient population, enhancing the statistical power and relevance of the data.
A critical factor in successful tokenization is the design of the informed consent process.3,4 The initial informed consent form (ICF) should clearly outline the scope of long-term data collection and the use of de-identified data for future research. Including tokenization in the original consent helps boost participant acceptance and avoids the logistical challenges of re-consenting in later stages of clinical studies. When done correctly, this approach streamlines the consent process and supports the seamless integration of tokenization into the study protocol.
Tokenization also addresses the practical challenges of long-term follow-up. Traditional methods often require frequent site visits and active monitoring, which can be costly and burdensome. With tokenization, data can be passively collected through routine healthcare interactions, reducing the need for direct study involvement.3,4 This can help minimize participant attrition, especially in trials involving healthy volunteers, and can alleviate the workload on clinical sites. It also ensures continuity of data collection even when patients change providers or are lost to active follow-up.
Beyond operational efficiencies, tokenization enhances the scientific rigor of clinical research. It enables comparisons between trial participants and the general population, helping to identify selection bias and improve diversity in study cohorts. Access to both retrospective and prospective data allows researchers to adjust for confounding variables and assess long-term outcomes, such as late-onset conditions or effects on specific subpopulations.
Protecting patients in LTF studies requires a multifaceted approach that balances regulatory expectations, ethical standards, and technological safeguards. Robust data privacy protocols are paramount. Tokenization ensures compliance with HIPAA de-identification standards and prevents re-identification or direct communication with participants.
As regulatory agencies increasingly demand extended safety and effectiveness data, tokenization offers a scalable and ethical solution to meet these expectations without launching entirely new trials. Ultimately, patient tokenization strengthens the integrity, inclusivity, and impact of clinical research. Early-phase planning for tokenization frameworks helps to simultaneously maximize the benefits while lowering the effort of implementation.
Duration and design of follow-up protocols
Establishing an appropriate minimum follow-up period is a critical component of vaccine trial design. While a 6- to 12-month follow-up may suffice for general populations, specialized groups often require extended observation periods, sometimes up to five years or more. This is due to the latency of certain adverse events, which may not manifest until well after initial vaccination. Regulatory agencies, responding to heightened public scrutiny, now emphasize longer-term safety monitoring to reinforce public trust and ensure comprehensive risk assessment. The European Medicines Agency (EMA) has issued guidance requiring post-approval commitments that span multiple years.3,4
Extension studies offer a valuable mechanism for continuing patient observation beyond the original trial period. These roll-over designs allow participants from a parent study to transition into a related follow-up study. They are particularly useful for detecting rare or delayed adverse events that shorter trials may miss. Historical examples like the West of Scotland Coronary Prevention Study (WOSCOPS) and Multi-modal Treatment of Attention Deficit Hyperactivity Disorder (MTA) studies demonstrate the utility of this approach in capturing long-term data.2
However, traditional extension studies often rely on active surveillance methods, which involve scheduled clinical visits and ongoing site engagement. This model can be prohibitively expensive and logistically challenging, especially when follow-up spans a decade or more. To address these limitations, the industry is increasingly adopting passive surveillance strategies that leverage RWD and tokenization technologies. Once again, it’s recommended that tokenization be integrated during Phase I or II and included in the initial informed consent to help minimize costs and maximize participant acceptance rates.
Population-specific considerations
Designing LTF strategies for vaccine trials requires special consideration for vulnerable populations such as the elderly, pregnant individuals, and immunocompromised patients.5 These groups often face elevated risks of complications from infectious diseases and may respond differently to vaccination. For example, older adults not only experience a higher frequency of serious adverse events but also show variable immune responses over time.6,7 Vaccine booster effectiveness can decline more rapidly in this group, underscoring the need for extended monitoring and tailored booster schedules.
Pregnant individuals represent another critical population for LTF.5 Although they are often excluded from early-phase trials, inclusion becomes essential once safety is established. Tokenization offers a powerful method for retrospective analysis, allowing researchers to track outcomes in pregnant participants without compromising privacy.
Immunocompromised patients also warrant longer follow-up due to their heightened vulnerability and potential risks associated with certain vaccine platforms.6,7 These individuals may experience atypical immune responses or adverse events that only emerge over time. Similarly, participants with comorbidities, such as diabetes, heart disease, or chronic lung conditions, require careful monitoring, as their baseline health status can influence both vaccine safety and effectiveness.
Conclusion
Ultimately, a proactive approach to LTF planning—anchored in early-phase implementation of infrastructure, LTF-focused informed consent, and tokenization—enhances data quality and supports evolving regulatory and public expectations for robust long-term safety and effectiveness evidence. This strategy offers significant benefits, including substantial cost savings, the ability to reduce patient and site burden, and the capacity to gain a better understanding of research cohorts. The COVID-19 pandemic underscored the urgent need for such robust and efficient designs to collect high-quality long-term safety and effectiveness data from large populations, further solidifying the importance of proactive planning for LTF in vaccine clinical trials.
References
- Moore J, Guichot YD. World Economic Forum. How to harness health data to improve patient outcomes. World Economic Forum. Published January 5, 2024. Accessed August 4, 2025. https://www.weforum.org/stories/2024/01/how-to-harness-health-data-to-improve-patient-outcomes-wef24/.
- Burcu M, Manzano-Salgado CB, Butler AM, Christian JB. A framework for extension studies using real-world data to examine long-term safety and effectiveness. Ther Innov Regul Sci. 2022;56(1):15-22. doi:10.1007/s43441-021-00322-8
- Mandziuk KA, Knotts-Keeterle D. Clinical Trial tokenization Workshop. Paper presented at: World Vaccine Congress 2025; April 22, 2025; Washington, DC.
- Knotts-Keeterle D, Mandziuk K. Unlocking the Power of tokenization in Clinical Development and Product Commercialization. Presented at: World Vaccine Congress; April 23, 2025; Washington, DC.
- Chen J, Ting N. Design considerations for vaccine trials with a special focus on COVID-19 vaccine development. J Data Sci. 2020;18(3):550-580. doi:10.6339/JDS.202007_18(3).0020
- Baden LR, El Sahly HM, Essink B, et al. Long-term safety and effectiveness of mRNA-1273 vaccine in adults: COVE trial open-label and booster phases. Nat Commun. 2024;15:50376-z. doi:10.1038/s41467-024-50376-z
- Faksova K, Walsh D, Jiang Y, et al. COVID-19 vaccines and adverse events of special interest: A multinational Global Vaccine Data Network (GVDN) cohort study of 99 million vaccinated individuals. Vaccine. 2024;42:2200-2211. doi:10.1016/j.vaccine.2024.01.100
Authors
Dinah Knotts-Keeterle is Vice President, Project Management, Vaccines and Infectious Diseases, ICON. Edward “Ted” K. Wright, Jr. Ph.D., Vice President, Government and Public Health, ICON
Filed Under: clinical trials, Infectious Disease



