How Informatics Unites Teams - Drug Discovery and Development

Decentralized J&J’s single portal informatics platform allows far-flung research groups to connect and collaborate

JR Minkel
Minkel is a freelance writer based in Brooklyn, N.Y.

click the image to enlarge
The home page to Leapfrog (right) contains user-specific information under “My Teams, My Projects, My Page.” The forum in WeDiscover (top left) is used for communication and posting messages between users at different sites. The home page to WeDiscover (bottom left) shows the daily news for drug discovery users. (Source: Jackson Wan, J&JPRD)

When drug discovery researchers open their Web browsers at any one of Johnson & Johnson Pharmaceutical Research and Development’s (J&JPRD) four research sites across the globe, their coffees at arm’s reach and ready to begin the day’s business, they enter a single electronic world carefully constructed by the company’s informaticians. From a Web portal called weDiscover, researchers can access not only news, weather, and the latest science, but also the “glue” that allows researchers to coordinate the company’s decentralized research effort, a project management tool called Leapfrog.

Having semi-independent research organization creates specific informatics needs, says Scott Kahn, chief scientific officer of software vendor Accelrys Inc., San Diego: “It poses some interesting challenges around synchronizing and the linking of information appropriately and fully.” Although Kahn says many companies now face these challenges, “J&J’s informatics is quite unique in that they are investing a lot to make it a cornerstone of their competitiveness.”

Customizing was key
J&JPRD’s informatics department is responsible for building solutions to the company’s global information problems and making sure local problems get solved efficiently (see “Inc. Globally, Act Locally” on page 27). Committee heads from the company’s research sites in La Jolla, Calif.; Beerse, Belgium; Spring House, Pa.; and Raritan, N.J., decide what informatics tools to offer their researchers. In La Jolla, a group led by Jackson Wan, PhD, director of bioinformatics, develops most of the global bioinformatics and related IT software. Everything from Web design and data processing (see sidebar on page 28) to disease modeling takes place in La Jolla.

To give researchers easy access to information, Wan’s group created a Web portal called weDiscover. The portal is a communications tool and “one-stop shopping” source for news, weather, and all of discovery’s central databases, from chemistry and biology to finance and human resources. Before the portal’s first installment, iDiscover, came out in 2003, the company had separate, hierarchical Web pages for the various discovery research activities in the United States. Researchers had to bookmark them and click through many pages to get the information they wanted. “The information was always hidden,” says Wan. “We started to realize [that we could] do a My Yahoo!-type version of that for discovery,” says Michael Jackson, who was then senior vice president, drug discovery US for J&JPRD, “one that interfaced with all of discovery’s tools and knew a user’s preferences.” Says Wan, “With the portal, everything is more up front. There is one page that you can [use to] get to all the resources that you need.”

Jackson Wan (right) meets with Joseph Ciervo (left) and Simon Smith (middle), members of his bioinformatics team based in La Jolla, Calif. (All photos courtesy of J&JPRD)

The weDiscover portal culls reports from sources such as J&J’s internal Web sites, PubMed, the BBC, and a licensed news stream for science, technology, and medicine. In its communications function, weDiscover links to forums, or bulletin boards for different subject areas, which can be useful given the brief overlap between working days in La Jolla and Beerse. After some complaints that iDiscover was too “La Jolla-centric,” Wan’s group customized the page for each site, using different logos and providing local information. The weDiscover resource was unveiled early in 2004 and as of October got 160,000 hits a day from about 2,000 users, or nearly half of the company’s employees. The La Jolla group is now building customized portals for many different research areas, from pharmacogenomics to ion channels to drug screening assays. They are also testing ways to link users who have similar interests and to suggest other information a user might want.

Erwin Coesemans, drug discovery associate scientist, medicinal chemistry, Beerse, stumbled onto weDiscover the day it came out. He was cleaning up his rather cluttered list of internal bookmarks in Internet Explorer, and one bookmarked page linked him to the portal. “I saw it and knew it was exactly what I’d been seeking for quite a while,” he says. “Now there is one page containing everything a person like me needs.” Through weDiscover, he searches the chemical literature and internal chemistry databases going back to 1956. He has also personalized the page with his mailbox and calendar from Outlook, as well as links to forums, a search engine he uses, and the journals he consults most frequently. Of those who don’t like the site, he says, their biggest complaint is that it is in English, not Dutch, like Beerse’s old page.

Hans Winkler (right) reviews bioinformatics data with scientist Jean-Marc Neefs (left) in Beerse, Belgium.

Pooling resources
Coesemans also puts a link on his weDiscover page to Leapfrog, a project management tool designed to keep researchers from duplicating each other’s work. In the late 1990s, the company’s US sites operated as one unit. But management decided that giving each site more independence would aid drug discovery, so the close ties between sites were split. Now management faced a new problem: The most promising targets hadn’t changed, so the newly independent sites were likely to be trying to discover drugs against the same targets. But the company didn’t have a convenient way of letting two groups of researchers at different sites know whether they were screening the same library against the same target. Different researchers would keep track of their projects in different formats, from Word documents to PowerPoint files, and someone would have to collect the relevant information from each document manually to make comparisons. With more than 300 projects usually ongoing at a time, that just was not practical. “I don’t mind two teams working on the same target,” says Jackson, now president of Alza Corp. (a J&J pharmaceutical company), “but what’s inexcusable is that they don’t know what each other is doing.”

Therefore, Wan’s group built a product to do the job. Released in 2001, Leapfrog allows researchers and management to track the progress of any multistep project, the number of people and amount of resources devoted to it, as well as its goals and accomplishments. At Leapfrog’s core is the genetic sequence of each target. It is a true combination of project management tool and bioinformatics database, says Wan. “The whole system is very gene aware.” When researchers begin a project, they enter the name or sequence of the target molecule and can automatically see what other researchers have done with the same target. In fact, research teams did find out that other teams were working on the same project. “Leapfrog is something that’s a requirement of a fairly decentralized discovery activity,” says Jackson, “where you change your mind pretty frequently, you work on lots of things and you try to prevent duplication.”

Leapfrog’s forward-looking aspect is essential to preventing duplication. By letting researchers mark their goals for a project, Leapfrog gives management the ability to see ahead, says Jackson. “[It can] capture what you think you will do in the next six months by target; if you don’t have that, you won’t be able to stop duplication.” Leapfrog still allows multiple entries per target, but relative latecomers have the responsibility to call researchers already working on a target to discuss their plan, their likely use of resources, and the possibility of collaboration. Leapfrog also helps compare work by J&JPRD and J&J’s therapeutic antibody company, Centocor Inc., so that if both groups are working on the same target, they might share resources.

Members of J&JPRD’s informatics team based in Raritan, N.J. From left to right: Lubing Zhou, David Uhlinger, and Xiwei Wang.

When J&JPRD staff in Beerse first heard about Leapfrog, they were hesitant, says Vic Maes, director of business support for discovery research, Beerse, Belgium. They had little involvement in the initial development and design and already had to contend with a lot of management tools. But after La Jolla improved the program’s user friendliness at their request and tutored them in how to mark goals and accomplishments using it, they began to see it as a highly valuable project management tool, says Maes. “It creates transparency. We’re updated on everything that’s going on in the whole discovery organization.” Senior management can use it to measure project progress, cycle times, attrition rates, and impose greater accountability, he says, and the business support group uses it to follow up on projects, share information, support benchmarking studies and measure performance. “It’s a thermometer of the organization,” Maes says.

Inc. Globally, Acting Locally
For Hans Winkler, PhD, senior director, functional genomics and bioinformatics at J&JPRD, Beerse, Belgium, the company faces the same informatics needs as more centralized organizations, in that it needs a combination of global and local solutions to its challenges. And just as the company is beginning to work more globally, he says, some global discussions also focus on how to implement local solutions to prevent different sites from recreating each other’s work. Oncology research takes place at the company’s research site in Beerse, for example, but not in La Jolla, where much of the company’s global informatics infrastructure gets started. Thus, Beerse needs specific tools that might not conform to any companywide standards of hardware. Beerse favors Web solutions because global client software isn’t always fast enough or immediately accessible. The biggest challenge for researchers at Beerse, says Winkler, is extracting the right information from experimental data as automatically as possible to understand biological pathways. By comparing pathway information gained from studying compounds to molecular profiles of patient samples, researchers in Beerse hope to predict which diseases their compounds will work in. Toward that end, Beerse is collaborating with La Jolla and software company OmniViz Inc., Maynard, Mass., to construct a pathway engine that would integrate several external biological pathway analysis tools from vendors such as Ingenuity Systems and Jubilant Biosys Ltd. No single tool fulfills all the site’s needs, and learning how to use multiple tools and switching between them is cumbersome, says Winkler. These sorts of local solutions are needed in any type of company, he says. “It does not matter whether the whole structure of the company is organized in semi-independent sites or [in a] global department. The need for global and local solutions as far as technology goes or informatics goes is always the same.”

Merging databases
Whereas Leapfrog addressed the decentralization of the company, the latest informatics project is designed to meet a problem that any large pharmaceutical company has probably faced over the last few years: incompatible chemistry databases. The problem for researchers is they want to know if a compound they’re working on has had hits in other assays, but can’t always access other databases in the company. After Janssen Research Foundation and R.W. Johnson Pharmaceutical Research Institute merged to form J&JPRD in 2001, the two organizations increasingly collaborated, but their legacy systems could not “converse.” Data from cardiovascular safety screening performed in Belgium, for example, entered its own database and was difficult to access elsewhere. Pharmacological data on the same compounds, generated in the United States, went to another database. “We didn’t have an impressive system to get the data from the machines into the database and have a consistent, standardized format that could be applied around the world,” says Jackson. “And that actually was hindering our collaboration. People would then be just e-mailing [data] to each other. It’s just not the way we should be doing drug discovery in the 21st Century.” If the company were to grow by acquiring biotechnology companies, as it has done in the past, the problem would only be compounded.

To combine the databases, the company initiated a project called ABCD, for Advanced Biology and Chemistry Discovery. The ABCD database will center on compounds much like Leapfrog centers on genes. Data from high-throughput screening, in vitro, and in vivo experiments are combined and organized around each structure. Starting in 2005, the company’s several legacy systems for storing chemistry data will merge and become directly linked to Leapfrog. Leapfrog will draw on data stored in the ABCD database, which will link biological results to targets in Leapfrog. The first challenge, says Jackson, was how to respect the experience of each J&JPRD site and the four other J&J pharmaceutical companies, who also participated in the early stages of ABCD. Every party wanted its system to stay isolated but wanted access to all the others, he says, which is an untenable position. So they identified common desires: getting data into the database in a better way and having tools to look at the data there, including cheminformatics viewers to see multiple data types at the same time. To help address these goals, J&JPRD acquired 3-Dimensional Pharmaceuticals Inc. in 2003, in part for the company’s visualization and cheminformatics tools (giving the company one more chemistry database in the process).

Building the database and accompanying visualization tools in-house will allow the system to evolve with the organization’s needs, says ABCD project leader Dimitris Agrafiotis, PhD, an early 3-Dimensional Pharmaceuticals employee who is now senior research fellow and team leader of molecular design and informatics, Exton, Pa. Buying commercial software requires expensive and time-consuming customization, he says, whereas outsourcing comes with high maintenance costs as the organization changes. In the mid- 1990s, the company went to outside providers first, and still uses them to some extent, says Jackson. “Right now, we feel we’ve got much more internal capability to do some of that.”

Sum It Up
At Johnson & Johnson Pharmaceutical Research and Development’s (J&JPRD) Raritan, N.J., site, a relatively small group of informaticians uses the Web to become more than the sum of its parts. Gene expression experiments are knowledge intensive at every stage, from design to data interpretation, says Xiwei David Wang, project lead, bioinformatics. They require many kinds of expertise, including that of therapeutic groups, bioinformaticians, and biostatisticans. Raritan doesn’t have enough researchers with each kind of expertise to provide each therapeutic area with its own informatics group to analyze gene expression data, so varied experts from different areas collaborate. Until recently, collaboration meant sending around the data to each researcher, with nowhere to combine everything and get feedback from everyone at once. Thus, Wang created a central Web application, largely from open source code, to allow everyone involved in an expression profiling project to work on the data and view others’ work. Before an experiment, all the Raritan groups involved in a project decide collectively on the experimental design. Biologists prepare the samples and run the experiments, then send the raw data to J&JPRD La Jolla electronically for processing. (The raw data is shared companywide.) On its return, informaticians and bioinformaticians run it through data analysis algorithms on the Web-based workspace designed by Wang. Other researchers can then see what they’ve done and make their own comments, refinements, or additions. Once target genes are identified, functional genomics and proteomics groups take over to validate them and determine the mechanisms of compounds.

Cross-disciplinary challenges
The biggest challenge was settling on a common chemical “ontology,” or vocabulary, says Agrafiotis. “You have to have the same dictionary [so that] when you say IC50 , you know you’re referring to the same thing.” So, the ABCD project team and scientific advisors from all the research sites came up with a common way of classifying assays and protocols. The next choice was how to unify the existing databases. The ABCD team chose to build an entirely new data warehouse, as opposed to linking the old ones through an additional layer of software. Building a new warehouse would be less complicated and allow more efficient access of information, says Agrafiotis. “We believe that performance is essential for user acceptance, and it can only come from a well-designed warehouse.”

When chemists and biologists upload their data to ABCD, it goes to an existing intermediary, or transactional, database, and the information is copied once a day to the warehouse. It is an active copying process, which means along the way the data is corrected, cleaned up, standardized, and reformatted to resolve differences in how assays are classified and scored, for example. This active copying process is an important element of the warehouse because of inconsistencies in the way some of the original database fields were filled in. “Users had a lot of freedom in how they would classify and annotate their data. As a result, you have a lot of entropy,” says Agrafiotis. The team is also developing tools to classify existing biological protocols, to allow users to navigate the maze of protocols and locate the ones they want. For example, assays would be classified (and subclassified) by whether they pertain to a whole organism, tissue, cell, or individual protein.

Warehousing the data is easy compared to constructing the advanced search tools users will need to make sense of the data and make project decisions, says Agrafiotis, prototypes of which were developed at 3-Dimensional Pharmaceuticals. The heart of the toolkit is a set of cheminformatics procedures for perceiving, analyzing, and manipulating chemical structures. These tools perform a multitude of tasks: they can check the integrity of a chemical structure; compute molecular properties and descriptors; search for substructures and similarity; detect and classify chemical types; aid chemical synthesis and combinatorial chemistry; perform structure-activity analysis; and build three-dimensional (3D) models; to say nothing of statistical analysis, data mining, and visualization. These tools are all presented to the user in a graphical interface. “The idea is you have a visual front-end that can plot anything under the sun,” says Agrafiotis. “It’s one thing to provide a table of numbers to the user. It’s a completely different thing to be able to take these numbers and put them in the context of a chemical structure and decide which parts are responsible for [the compound’s activity].”

As of this writing, some users were already beta testing ABCD, and the first major release was scheduled for December 2004. The project will probably take another year or 18 months after that to complete and make user friendly, says Wan. Once the rules of ABCD are worked out, his group plans to introduce electronic notebooks to automatically send raw data to the warehouse.

After ABCD has shown its value, the next step is obvious, says Agrafiotis: making a closer connection between genes and molecules. That will require integrating bio- and cheminformatics more completely by linking structural biology and microarray data, sequence analysis and classification, and biological pathway analysis through chemical structure, he says. “It will probably happen in the next two to three years.” At that point, given all of the company’s informatics changes, incommensurable databases and fragmented knowledge might start to seem positively quaint.