Advancements in mass spectrometry promise to make biomarker discovery a lot easier in the future.
Interest in the field of biomarker discovery—biological compounds that when detected in tissues or body fluids can be used to indicate a specific disease state—is growing at a rapid pace. Once known, a biomarker can be used to diagnose the presence or risk of disease or to match treatments to a specific individual and their specific form of disease. In the past, researchers identified biomarkers based on lengthy serial studies of the biology of disease mechanisms. New highly-parallel screening and profiling approaches promise to speed the discovery of important biomarkers, without the need for deep know-ledge of disease mechanisms.
Because proteins are among the most promising biomarkers, their rapid analysis is of greatest interest. Proteins are diverse and complex and their amounts in tissues and fluids can vary greatly throughout the body as well as within individual cells depending on the cell’s lifecycle and its environment. No single liquid chromatography/mass spectrometry (LC/MS) workflow has been adopted as the gold standard for protein biomarker discovery and identification. The common data-dependent strategy of identification by MS/MS, which measures the most intense ions eluting over a specific time, provides data representing only a subset of the actual proteins present in a sample. Extensive fractionation of complex samples may be necessary to identify more of the proteins, greatly increasing the number of analyses required. These approaches work, but a concern is that the less-abundant proteins, which could be key biomarkers, may go undetected.
While MS/MS techniques can be used, analyses based on accurate-mass liquid chromatography time-of-flight mass spectrometry (TOF LC/MS) are the most rapid and sensitive. Because of the complexity of the samples encountered, advances in TOF LC/MS instrument design and software are needed to make this new, more rapid approach possible.
Biomarker discovery workflow
In biomarker discovery, mass profiling is used to find statistical differences between similar biological sample sets. In a typical mass-profiling experiment, large sample sets are analyzed by MS coupled with a separation method and complex sets of molecular features (biomarkers) are generated. A molecular feature is defined by the combination of retention time, accurate mass and abundance. Advanced statistical analysis is used to compare and to find significant differences between the large numbers of molecular features in the sample sets. To aid identification, the molecular ions produced by accurate-mass TOF instruments can be used to search mass-based databases. Targeted MS/MS analysis can then be used to generate spectral information to aid in identification.
This article describes the results of a typical biomarker discovery workflow recommended by Agilent.
In one set of experiments, Escherichia coli (E. coli) lysate was used as a model to represent a complex sample. In order to mimic up- and down-regulation in these samples, equivalent amounts of lysate were spiked with varying amounts of bovine proteins (bovine serum albumin [BSA] and serotransferrin) for detection using the mass-profiling approach (Table 1, page 44). The profiling experiments were performed on an Agilent 1200 Series HPLC-Chip/MS system interfaced to an Agilent 6210 TOF MS. Accurate-mass TOF LC/MS data were extracted and evaluated using Agilent’s MassHunter molecular feature extraction algorithm. Targeted LC/MS/MS analyses were performed using Agilent’s HPLC-Chip/MS system interfaced to an Agilent 6330 Ion Trap MS. Peptides were identified using Spectrum Mill MS Proteomics Workbench software with the SwissProt protein database.
Figure 1A shows the sample complexity. Even creating an extracted ion chromatogram (EIC) for the peptide at m/z 504.2507, using a narrow mass range of plus or minus 1.9 ppm, does not result in a single peak (Figure 1B). The mass spectrum for the peak at 9.2 minutes shows that the low abundance peptide ion would likely not be selected for MS/MS using traditional data-dependent approaches because there are many other more abundant ions (Figure 1C).
Feature extraction requirements
Mass profiling requires finding molecular features in complex data such as that shown in Figure 1. This step is challenging for the data analysis software because the complexity of the biological mixtures analyzed results in co-eluting peaks, regardless of the quality of the separation. It produces thousands of data points for the software to process. Powerful molecular feature extraction (MFE) software that can locate individual sample components—including low-level components—in complex chromatograms is needed. To generate data useful for subsequent statistical analysis, the molecular feature extraction software must find the molecular features in each total ion chromatogram, remove background noise and unrelated ions, create reconstructed spectra, and create a molecular feature list, all very quickly and easily.
Using Agilent’s MassHunter software, noise is effectively removed because it is identified as non-chromatographic peaks. How well the software works is highly dependent on the accuracy of the MS data. Prior to the commercial availability of molecular feature extraction tools, manually working through these complicated data sets took weeks or months.
TOF LC/MS design requirements
![]() Figure 2: Log-log plots for sample A and B versus the control. Differential features were found at the expected two and four times ratios. (Source: Agilent Technologies)
|
In order for the molecular feature extraction algorithm to produce meaningful results, the TOF LC/MS system must provide a mass accuracy of 1-2 ppm, five decades of dynamic range and chromatographic reproducibility. Because the most miniscule instrument variations can cause a noticeable mass shift, automated two-point internal reference mass correction is needed to achieve an accurate-mass measurement. In this technique, two compounds of known mass are introduced continuously. The control software constantly corrects the measured masses of the samples using the known masses as reference.
An analog-to-digital (ADC) detector is needed to achieve accurate mass assignments and a wide dynamic range. Unlike a time-to-digital converter (TDC) that only registers an ion arrival above a certain intensity and gives the same response regardless of whether the signal is the result of one or many ions, an ADC converter creates a continuous digital representation of the ion detector’s signal. When multiple ions of a given mass arrive at the detector within a very short time, an ADC with a fast, 32 Gbit/sec sampling rate can translate this rising and falling signal into a very accurate digital profile of the mass peak. The detector output is accurately represented regardless of whether it is from a small or large ion current. These high-speed electronics also improve chromatographic resolution, thereby making ADC-based TOF systems easy to couple with fast chromatography.
Because mass measurement is dependent on the length of the TOF’s flight tube, it must have a low coefficient of thermal expansion and it must be protected from temperature fluctuations if mass measurements accurate to within 1-2 ppm are to be achieved. This exceptional mass accuracy can be attained by devising a flight tube constructed from a metal alloy with an extremely low coefficient of thermal expansion. An insulated outer shell with an evacuated air compartment protects the inner components from temperature changes.
Differential analysis requirements
After the molecular features are extracted, statistical tools are needed to determine which features are statistically meaningful and to segregate experimental variation and within-population variation from cross-population variation. These tools must enable researchers to easily import, analyze, and visualize data from large sample sets and complex experimental designs. Often a range of statistical analysis tools, significance testing, principal component analysis, clustering, class prediction, and hierarchical trees are needed to obtain a differential analysis of the sample sets. In this study, Agilent’s Mass Profiler software was used (Figure 2).
Table 1: Amounts injected onto column. (Source: Agilent Technologies) | |||
Sample |
E. Coli lysate (ng total protein) |
BSA (fmol) |
Serotransferrin (fmol) |
Control |
400 |
100 |
200 |
Sample A |
400 |
200 |
100 |
Sample B |
400 |
400 |
50 |
Protein identification requirements
Although advanced TOF LC/MS systems produce accurate-mass information that can be used to reduce the number of possible compound identities, it may not be sufficient for identification. In these circumstances, the next step is to run a targeted MS/MS analysis of the significant features to produce fragmentation information that can be used for identification. Q-TOF or Ion Trap LC/MS systems are ideal for performing these targeted analyses. To speed MS/MS method setup, the differentially expressed feature list generated by the mass profiling software must be easy to import as an inclusion list for targeted MS/MS analysis. For the example in Table 2, a targeted MS/MS identification of the differential features was performed by importing the TOF differential features’ mass data as an “include” mass list into the instrument control software of an Agilent 6330 ion trap LC/MS. Samples A and B were reanalyzed using the same LC conditions. Next, the BSA and serotransferrin peptides were correctly identified using Agilent’s Spectrum Mill proteomics software and the SwissProt protein database.
Reliable biomarker detection and identification require reproducible data. To determine the quality of data generated by the Agilent’s HPLC-Chip/TOF MS, cross-sample response relative standard deviation (RSD) values were calculated to be approximately 0.3% (Table 2) and calculated mass RSD values were 0.99-1.42 ppm.
Table 2: Retention time and mass reproducibility for sample B and the controls demonstrate mass accuracy. (Source: Agilent Technologies) | |||||
Number of replicates |
RT (min) |
% RSD of RT |
Mass |
SD of Mass (mDa) |
SD of Mass (ppm) |
20 |
9.13 |
1.02 |
1509.7273 |
1.6 |
1.05 |
19 |
20.10 |
0.32 |
2016.9039 |
2.0 |
0.99 |
19 |
20.52 |
0.33 |
1310.6421 |
1.3 |
0.99 |
19 |
27.74 |
0.29 |
1388.6666 |
1.6 |
1.115 |
20 |
28.35 |
0.31 |
918.5474 |
1.3 |
1.42 |
Conclusion
The results of this study demonstrate that a profile-directed approach to biomarker discovery using a TOF LC/MS profiling system is a powerful way to uncover low-abundance, differentially-expressed components within complex samples. The approach allows for the efficient, targeted analysis of only differentially-expressed features and thereby makes this method a more effective, sensitive and reliable alternative to traditional MS/MS approaches. To ensure the success of this approach, the TOF LC/MS system used must demonstrate the exceptional mass accuracy, wide dynamic range and reproducibility needed to identify low-abundance proteins in the presence of significantly higher-abundance proteins as well as powerful software tools to process complex data and produce statistically-meaningful answers.
About the Author
Ning Tang,, Ph.D. is a senior applications scientist at Agilent Technologies, Inc. She specializes in proteomics and other uses of mass spectrometers to solve biological questions.
Christine Miller has over 15 years of experience in LC and LC/MS. She is currently a senior application scientist for LC/MS at Agilent Technologies working on proteomics applications.
This article was published in Drug Discovery & Development magazine: Vol. 11, No. 2, February, 2008, pp. 42-45.
Filed Under: Genomics/Proteomics