Web Exclusive
For researchers, data is both a blessing and a curse. Online tools can help make data work for you.
In graduate school, merely uttering the word “data” aloud was enough to silence an entire gathering of students, postdoctoral fellows, and laboratory technicians. It was not just a four-letter word; it was the four-letter word.
While working on dissertations, data were much more than a simple goal. The pursuit of data was a passion, a life’s pursuit, and an ultimate torment. Data came from many different sources, but it was time-consuming to collect and organize. When data was finally in the pipeline to be published, it could take many months before a manuscript was published and seen by other scientists and colleagues.
The unfortunate part is that years later, data is still listed as a major challenge to conducting one’s research. Once it is analyzed and published, where else should it go besides on a computer hard drive? How can other basic researchers be assured open and free access to data? If data is the way in which research hopes to find answers to biological questions, then why does it remain such a challenge? Why is “data” still the four-letter word for scientists? The good news is that—thanks to time and technology—possible solutions exist to lessen the research challenge known as “data”.
Although perhaps not well publicized, there are answers to where researchers may seek information, ideas, and data besides medical journals, computer hard drives, or random CDs that float around in desk drawers.
The US government provides over $94 billion to fund basic research, mainly through government agencies such as the National Institutes of Health (NIH) and National Science Foundation (NSF).1 To take full advantage of the enormous knowledgebase that results from all government-funded research projects, the federal government provides one of the largest, free record banks in the form of the US Department of Energy’s Office of Scientific and Technical Information.
The site’s mission is to support “the diffusion of knowledge to advance science”. It provides access to this published knowledge by organizing and linking the public to over 1,000 scientific databases in chemistry, biology, materials science, and engineering. This source of information goes well beyond what is provided by common search engines like Google or Yahoo. In fact, the government’s own research databases can be fully accessed and mined through this site since 2004, and these databases hold much more than manuscripts and graphs. Non-text data available through this portal include computer simulations, interactive maps, movies, and scientific images. This site exists to “share and exchange science information”, and it provides subject-specific databases, electronic publishing, and Web site development tools to enhance the exchange of knowledge. The resources of the leading sponsor of biomedical research are important to remember when considering a new hypothesis or seeking out ideas for that next experiment.
The Web site www.science.gov, which gives visitors access to over 50 million pages of government-funded research data, is a conglomerate of over 17 science groups and 13 associated federal agencies worldwide. The site began in 2005, when users could receive alerts by e-mail when something within their area of interest was published. Scientists can search this gateway of knowledge by author, subject, or funding agency. Links to foreign government science pages provide easy access to research data completed anywhere around the globe.
When the need for preliminary data arises, scientists can avoid searching through indecipherable lab notebooks or trying to access journals without having a current subscription. Instead, researchers can point a Web browser to these often overlooked Web sites, and find easily searchable data at the fingertips.
Sharing data
Once the search for preliminary data is complete, where should scientists look to share their own data with the worldwide scientific community? Most scientists focus solely on publication in peer-reviewed journals as a way to announce their research findings. Yet not all scientists have equal access to biomedical journals. However, other methods of sharing biological knowledge exist in this age of information.
An excellent example is the Worldwide Protein Databank (wwPDB) . The wwPDB aims to be a single repository for “macromolecular structure data that is freely and publicly open to the global community”. Since its inception in 2000, the databank has received over 46,000 inputs of data, and the Web site was viewed over 1.7 million times in August 2008 alone. The site’s administrators assure continuity and accuracy of the data, but the worldwide access allows for the information to be distributed to anyone who accesses the Web site. Sharing and storing data in this manner ensures a broader audience than a typical journal publication.
Protein scientists are not unique in this respect. Data banks, much like the wwPDB, exist for a multitude of other research areas including:
- Nucleic Acid Database (NDB)
- Spectral Database for Organic Compounds (SDBS)
- Ribosomal Database Project (RDB)
- Neurological Disorders Database Repository
Data dissemination from clinical trials, though usually more restricted and challenging to find, is still available to those who are willing to participate. The Clinical Trials Network, initiated by the National Institute on Drug Abuse, has a program called the CTN Public Data Share that provides a two-way street for those involved in collecting and assessing past clinical trial data. The goal of the network is to promote more research; data from completed studies is freely accessible and can be easily shared via this site. Data can be exchanged once the material is approved for publication or the data is more than 18 months old. Great care is taken to “de-identify” participants in the study to protect the private information of individuals.
Another pertinent Web site is www.clinicaltrials.gov, which lists current clinical trials by geographic location, sponsor, or condition. That database currently includes over 60,000 ongoing trials, whether that trial is funded by government, private industry, or by one of 157 foreign countries. Although handled differently than basic or pre-clinical research data, the free exchange of clinical trial data could be the catalyst that uncovers a therapy for another disease.
Archived preliminary data, data exchange, and open data access are still not as available as they should be in this age of instant communication. Frequently, road blocks stand in the way of discovering information that would improve the next experiment, clinical trial, or grant submission. Data—that four-letter word that often stands in scientists’ paths—can still be a thorn in the side of biomedical advances. Yet, several groups have made significant strides in helping scientists obtain and share research data from anywhere in the world. The solutions may not be as highly broadcasted as one might expect, but by taking the extra effort to seek out these repositories of information, the synergies that result could be the next major advance in biomedical science.
About the Author
Rebecca J. Henderson graduated from Penn State University with MBA and PhD degrees in 2004. She now works for Thermo Fisher Scientific as an associate product manager for the Cellomics high content screening product line.
1. Parker, Randall. (September 2005) “Biomedical Research Funding Doubles in U.S. in 10 Years.”
Filed Under: Drug Discovery