Design

Half of the 78 million people who have been infected with HIV since its identification in the 1980s have died, making it one of the most merciless epidemics in history. Today, more than 1 in 200 people is infected, with two million new infections recorded each year. While medical advances have made HIV easier to manage, researchers agree that an HIV vaccine is the most likely, and perhaps the only way by which the AIDS pandemic can be stopped.

In the face of all this, the Statistical Center for HIV/AIDS Research & Prevention (SCHARP) is on a mission to help HIV and other vaccine researchers around the world collaborate through data. SCHARP partnered with Artefact and LabKey Software to help define the objective, design a solution, and build the DataSpace, a web tool to empower vaccine investigators to explore data across HIV studies, generate new hypotheses, and accelerate the path to discovery.

[SCHARP DataSpace] is giving me the freedom to play with the data. It’s filling a niche that is totally empty right now.”

HIV has many strains, mutates quickly, infects the very cells meant to fight it, and exposes very little of itself to attack. Researchers have conducted hundreds of HIV vaccine studies over the years, each setting out to explore a specific hypothesis about how it works or how we might fight it. Hidden within and across these studies are other important insights that were not part of the analysis plan. They remain undiscovered because the data can be incomplete, inaccessible, and difficult to stitch together. Researchers have to wait years before they can access their colleagues’ results in published research papers. More importantly, the actual data that produced the papers is often unavailable, relegated to a huge data graveyard where potential clues to vaccine stay buried.

In light of this, the Global HIV Vaccine Enterprise, the group of top HIV experts and funders, called for “a dramatic shift in the culture and practice of sharing research data.” Their top priority? Creating “databases for sharing trial data globally and an insistence on pursuing diverse hypotheses.”

The DataSpace brings researchers information that is easy to access, filter, explore, interpret, and export for further analysis. By using DataSpace, they can identify gaps in current research, review and learn about past work that can help them secure grants, and test new ideas to see if they are worth further exploration.

A new way
of thinking

The purpose of the DataSpace is to make data more open and broadly available, changing the way researchers think about and share it. In order to design a solution that researchers would embrace, we needed understand how they work. We went through several key immersive steps to understand the science of HIV vaccine research and the culture of the community.

Our most critical design choice came down to figuring out how to organize the data in the system. Previously, researchers needed to know their specific question ahead of time to align the data and make a valid combination (if they had access to the data at all). But after talking with researchers we flipped the model on its head: what if we pre-combined all the data? Doing so takes a lot of upfront work but enables users to pose any number of questions that lead them to exploring new directions. The DataSpace gives researchers the opportunity to uncover connections they had not anticipated.

For the DataSpace to be truly useful, we had to design analysis tools that provide value across a large range of experiments and data sources. Unlike specialty tools targeting results from only one test, we developed a core set of visualizations that show data from any test. For instance, the plot visualization lets users take one to three variables from different sources and find patterns in post-vaccine immune response across tests, studies, vaccine types, and more. Unlike generic visualization platforms, the DataSpace is easy to jump into, has many unique visual analytics features made just for vaccine science, and empowers users to see and interact with data based on multiple relevant criteria.