While enabling real-time international collaboration, this new paradigm has also introduced novel challenges. Simon Adar, CEO of Code Ocean, found the struggles of cross-geographical R&D collaboration during his PhD work at Cornell University. While file-sharing systems provided some relief, they fell short when it came to coordinating code, data, software and troubleshooting across different geographies. “It wasn’t enough, because when you have code and data, you also need all the software dependencies for that code to work,” Adar said.
“In the end, collaboration occurred primarily through Word documents, which was disappointing,” Adar said. “This experience fueled my desire to improve collaboration in research projects,” he said.
Adar’s experience led to the creation of the computational reserach platform Code Ocean in 2015, a platform designed to streamline and enhance the research process. “This vision became a research project during my postdoc at Cornell, which later evolved into a company. This initiative aligns with the ‘Open Science Library’ from Nature and Code Ocean, where researchers can browse code without having to install it locally, thereby eliminating many of the mentioned challenges in scientific collaboration,” Adar explained.
The company aims to alleviate inefficiencies through a five-point strategy centered on:
- Cloud-based technologies.
- Centralized asset management.
- Standardized analysis workflows.
- A streamlined research management lifecycle.
- Self-serve options for the research community.
One of Code Ocean’s technologies, known as a ‘capsule’, offers a complete, self-contained computational environment that includes everything necessary to reproduce computational research. “We have a user interface, so you don’t need to be a Docker expert for this. Our UI generates its own Docker file,” Adar said. This approach provides access to all necessary components along with a timeline of various versions. This method allows scientists, computational biologists, discovery IT members and bioinformaticians to keep tabs on different iterations and changes.
Cloud adoption in biopharma companies: A generational distinction
Adar notes a stark contrast in the digital strategies of older and younger biopharma companies, particularly in their rate of cloud adoption. He estimates that upwards of 90% of companies established in the past decade are integrating cloud-based technologies as a core part of their infrastructure. Conversely, established biopharma companies have traditionally relied on on-premises data centers and legacy systems. “But even established companies are starting to think about the cloud,” Adar said. “It’s much more modern, requires less upfront capital investment and you can scale up and down according to your demands.”
Given the continued adoption of cloud and its ability to save researchers time and effort in setting up and maintaining computing environments, a growing number of research haver emerged to support seamless collaboration and reproducibility. Examples of cloud-based include research tools like Docker, Binder and Google’s Colaboratory, which help researchers tackle different facets of scientific investigation. Yet, researchers still often struggle with issues like software compatibility, especially when using open-source projects on different systems. “For instance, it can be challenging to get open-source projects working for a variety of reasons,” Adar said. “One common issue is software compatibility, as researchers might be using different systems, such as Mac, Linux or Windows and Python versions.”
Cloud-based tools supporting research collaboration and reproducibility
To address these challenges and enhance reproducibility in research, cloud-based services such as Code Ocean and the multi-language computational notebook platform Nextjournal have emerged as options for supporting reproducible research. Code Ocean aims to capture all information needed to re-execute an analysis. This allows researchers to share fully reproducible analyses with reviewers, collaborators and the public.
In the modern research landscape, reproducibility of results presents significant challenges. These can range from the complexity of experimental procedures to the difficulty in sharing vast amounts of data and analysis methods. To address these hurdles, a number of technologies such as cloud-based services like Code Ocean and the multi-language computational notebook platform Nextjournal have emerged in recent years.
Code Ocean, for example, enables the encapsulation of all computational details required for an analysis, from the dataset used to the specific versions of software libraries, in a ‘compute capsule.’ Nextjournal offers the ability to seamlessly interweave code, text and data in a single document.
Adar emphasized the importance of distinguishing between data sources and the storage technologies used to store and process them. “It’s essential to have code that can access and process this data correctly within different workflows and pipelines, regardless of the storage method used,” he pointed out.
The role of AI and cloud-based services in navigating the data deluge
As the volume of scientific research continues to swell, the growing adoption of AI and cloud technologies can help R&D professionals mine insights. In an article on pharma R&D, McKinsey estimates that big-data informed decision making could unlock up to $100 billion in value annually across the U.S. healthcare system
In this vein, Adar provided an initiative project Code Ocean is undertaking to help scientists. “For instance, one of our projects with a customer explores how we can assist scientists to become more self-reliant in their research,” he said. “Traditionally, scientists who are skilled in programming can code interfaces to databases and visualize their data, but many lack these coding capabilities.”
In this context, Code Ocean is using AI agents, autonomous software charged with performing defined tasks. Such agents could potentially automate coding tasks, enabling scientists to focus on their core research questions. As Adar points out, “Though coding scientists are still necessary, our approach offers a more specific and tailored solution to repetitive tasks. By creating different agents for distinct use cases, we can supply scientists with an efficient tool.” This approach suggests a future of scientific research where AI doesn’t replace scientists, but rather works alongside them, amplifying their capabilities.”
But successfully wading through the data deluge in this environment requires a clear understanding of the distinctions between data sources and the tools designed to harness them, as Adar points out.
For instance, cloud storage services like Amazon’s S3 and Google Cloud Storage can provide efficient storage and computational resources for a variety of workloads. But these storage buckets are only pieces of a larger puzzle. While they serve as the backbone for data storage and processing, they do not offer a comprehensive data management solution alone.
In addition, he underscores the importance of reproducibility in the current research landscape. “Code Ocean aims to capture all information needed to re-execute an analysis,” he said. “This allows researchers to share fully reproducible analyses with reviewers, collaborators and the public.”
Filed Under: Cell & gene therapy, Data science, Industry 4.0, machine learning and AI, Regulatory affairs