Designing an open
science solution for
Designing for open science
Half of the 78 million people who have been infected with HIV since its identification in the 1980s have died, making it one of the most merciless epidemics in history. Today, more than 1 in 200 people is infected, with two million new infections recorded each year. While medical advances have made HIV easier to manage, researchers agree that an HIV vaccine is the most likely, and perhaps the only way by which the AIDS pandemic can be stopped.
In the face of all this, the Statistical Center for HIV/AIDS Research & Prevention (SCHARP) is on a mission to help HIV and other vaccine researchers around the world collaborate through data. SCHARP partnered with Artefact and LabKey Software to help define the objective, design a solution, and build the DataSpace, a web tool to empower vaccine investigators to explore data across HIV studies, generate new hypotheses, and accelerate the path to discovery.
A platform for an empowered, aware, collaborative community
HIV has many strains, mutates quickly, infects the very cells meant to fight it, and exposes very little of itself to attack. Researchers have conducted hundreds of HIV vaccine studies over the years, each setting out to explore a specific hypothesis about how it works or how we might fight it. Hidden within and across these studies are other important insights that were not part of the analysis plan. They remain undiscovered because the data can be incomplete, inaccessible, and difficult to stitch together. Researchers have to wait years before they can access their colleagues’ results in published research papers. More importantly, the actual data that produced the papers is often unavailable, relegated to a huge data graveyard where potential clues to vaccine stay buried.
In light of this, the Global HIV Vaccine Enterprise, the group of top HIV experts and funders, called for “a dramatic shift in the culture and practice of sharing research data.” Their top priority? Creating “databases for sharing trial data globally and an insistence on pursuing diverse hypotheses.”
The DataSpace brings researchers information that is easy to access, filter, explore, interpret, and export for further analysis. By using DataSpace, they can identify gaps in current research, review and learn about past work that can help them secure grants, and test new ideas to see if they are worth further exploration.
[SCHARP DataSpace] is giving me the freedom to play with the data. It’s filling a niche that is totally empty right now.”
A new way
The purpose of the DataSpace is to make data more open and broadly available, changing the way researchers think about and share it. In order to design a solution that researchers would embrace, we needed understand how they work. We went through several key immersive steps to understand the science of HIV vaccine research and the culture of the community.
Data organization: From spreadsheets to subjects
Our most critical design choice came down to figuring out how to organize the data in the system. Previously, researchers needed to know their specific question ahead of time to align the data and make a valid combination (if they had access to the data at all). But after talking with researchers we flipped the model on its head: what if we pre-combined all the data? Doing so takes a lot of upfront work but enables users to pose any number of questions that lead them to exploring new directions. The DataSpace gives researchers the opportunity to uncover connections they had not anticipated.
The original idea for the DataSpace was to support a more open style of online collaboration with social posting, the ability to annotate conclusions, write comments and contribute to discussions. But researchers firmly rejected that idea. Their reputations rest on rigorous processes and peer reviews. Spontaneous comments or misinterpretation of data can cause irreparable career damage. At the same time, they value new collaborations and novel interpretations of their results. So, how can we facilitate collaboration and sharing, while minimizing the valid concerns of researchers?
In the DataSpace, users can see who contributed data to a dataset, alongside study details and contact information. This makes it easy for researchers to credit the original data contributors or clarify how they are interpreting the data. Instead of forcing digital collaboration through the DataSpace, our goal is to spark new conversations and collaboration between labs in the ‘real world’ of conferences, email, phone calls, and partnerships.
The power of data visualization
For the DataSpace to be truly useful, we had to design analysis tools that provide value across a large range of experiments and data sources. Unlike specialty tools targeting results from only one test, we developed a core set of visualizations that show data from any test. For instance, the plot visualization lets users take one to three variables from different sources and find patterns in post-vaccine immune response across tests, studies, vaccine types, and more. Unlike generic visualization platforms, the DataSpace is easy to jump into, has many unique visual analytics features made just for vaccine science, and empowers users to see and interact with data based on multiple relevant criteria.
A new first step for any researcher
The power of science is the ability to build on previous discoveries. Yet some researchers might not be aware of existing work or lack details on how it was performed. To address that need, we created the “Learn About…” section. It serves as an encyclopedia of HIV vaccine studies and immune assays, and a first step before embarking on a new research study.
Make a virtual cohort
Cohorts are groups of subjects with something in common – usually they are in the same study and treatment. But in the DataSpace, users can define a cohort across studies using any subject characteristic or threshold of experimental performance they choose. Save it for later and explore any number of ideas with it.
Discover new relationships through a multidimensional view of data
The plot has room for three variables, with special views for comparing groups, comparing experiments, and tracking immune response over time. While it was meant to reveal interesting ideas about immune response, it’s also useful to quickly understand the characteristics of the available data.
first step to overcoming HIV
The long-term vision for the DataSpace is to become part of the research pipeline in vaccine trials, adding new data from studies as they are completed. Researchers are already using it to answer basic factual questions, perform quick, low-cost tests of an idea, make comparisons across diverse measures, and start to deepen collaboration with one another. The DataSpace is available to hundreds of researchers worldwide who are on the frontline of fighting this devastating disease. Our goal is to continue to grow the DataSpace with rich data from multiple networks, studies and researchers, helping direct research towards promising new hypotheses and becoming a case study for the promise of open science along the way.
From health data and devices to insights and decisionsLink to Chronicle case study