Data in the real world

The art of folding “big data” together to cure disease

A big question swirled through the scientific world last year: Was the highly mutated SARS-CoV-2 “Pirola” variant going to be a problem? Could it rip through immune cell defenses to trigger severe cases of COVID-19?

Researchers at La Jolla Institute for Immunology (LJI) worked quickly to solve these puzzles. Doctors in Israel and Denmark had already reported cases of infections with the Pirola variant, and the virus appeared to be spreading to new regions.

Speed was critical. The problem was that no one had collected comprehensive data yet showing how immune cells actually responded to the Pirola variant. The variant was just too new.

That hurdle didn’t stop LJI Professor Alessandro Sette, Dr.Biol.Sci., and LJI Research Assistant Professor Alba Grifoni, Ph.D. The researchers had access to a massive, well-organized collection of SARS-CoV-2 data. This resource, known as the Immune Epitope Database

(IEDB) holds key findings on how the immune system’s T cells have combatted previous SARS-CoV-2 variants.

Dr. Grifoni quickly devised a bioinformatics approach to comb through the IEDB and uncover clues to how T cells might respond to Pirola. “SARS-CoV-2 keeps evolving, and it’s hard for experimental researchers to keep up with how fast the virus changes,” says Dr. Grifoni. “We wanted to know—can we design a data-analysis pipeline to essentially predict the effects of new SARS-CoV-2 variants?” Thanks to the IEDB, the researchers were on their way to finding patterns, fast.

Leaders shaping data science

LJI scientists use data science and bioinformatics to make the world a healthier place. Every study requires some data analysis, of course, but LJI scientists have created resources that fold huge datasets together so researchers can easily share findings and launch new projects. These efforts have helped guide vaccine development and fuel public health efforts around the world.

For example, in March 2020, Drs. Grifoni and Sette published the first study suggesting that human T cells could recognize SARS-CoV-2 infection. This prediction was based on coronavirus data from the IEDB, and it gave many people hope that a COVID-19 vaccine was possible. Later, once scientists had analyzed data from actual COVID-19 patients, they found the exact T cell activity Drs. Grifoni and Sette had predicted.

LJI’s role as a world leader in immune system data science began with the IEDB. Dr. Sette and LJI Professor Bjoern Peters, Ph.D., established the IEDB in 2003 with funding from the National Institute of Allergy and Infectious Diseases. At that time, important data were scattered across manuscripts in dozens of different scientific journals. Scientists needed to view these discoveries in one place.

“We were pioneers in making data accessible to the wider scientific community,” says Dr. Peters. “Every lab generates a wealth of data. We make these data more useful by capturing it, not in some kind of lab notebook, but in a database, and making it available to internal or outside users.”

Managing the IEDB is a surprisingly hands-on process—even an artform. IEDB Senior Project Manager Nina Blazeska leads the team of curators who comb through scientific studies for epitope data. “These LJI curators are Ph.D.-level immunologists who extract epitope data from scientific publications,” says Blazeska. “It takes a detailed understanding and a lot of time.” 

The IEDB isn’t just a database: It’s a tool. LJI Bioinformatics Core Director Jason Greenbaum, Ph.D., was instrumental in building the IEDB in the mid-2000s. Today he works closely with Blazeska and manages a team of web developers who handle requests from the IEDB’s user base. “We do a lot of problem-solving, and we’re always reviewing user feedback to see which features we should add,” says Dr. Greenbaum. One fascinating new IEDB feature is the 3D structure viewer, which gives scientists a glimpse of the actual molecular structures that immune cells “see” when they encounter pathogens.

A look at the numbers shows the importance of the IEDB within the research community. “Over the course of the IEDB’s life, we have been cited more than 25,000 times,” says Dr. Sette.

“We can also look at the impact of the IEDB in stimulating applications in the pharmacological and biotech industry. Over the last 20 years we have been quoted in 665 patents. Of those, 225 are patents submitted between 2021 and 2022. The impact of the IEDB is accelerating.”

New databases unfold

In 2021, the National Cancer Institute granted Drs. Peters and Sette funding to build a similar epitope database to fuel cancer research. This new database is called the Cancer Epitope Database and Analysis Resource (CEDAR). Scientists can use CEDAR to study—and even predict—how T cells and antibodies target different types of cancer cells. Understanding these responses to cancer is a key step in developing cancer immunotherapies that rely on the immune system to kill cancer cells.

“We’re giving the cancer community what they’re after—a one-stop resource for experimentally validated cancer epitopes,” says Blazeska.

A deep dive into patient health

LJI’s Database of Immune Cell Epigenomics (DICE) addresses a different critical need: to understand exactly how genetic variations regulate gene expression and drive disease risk.

This database, directed by LJI William K. Bowes Distinguished Professor Pandurangan Vijayanand, M.D., Ph.D., launched in 2014 with funding from the National Institutes of Health. Dr. Vijayanand and his colleagues are on a mission to learn everything they possibly can about immune cells from a unique cohort of donors recruited from the San Diego area.

“We started by collecting and freezing blood cells from 91 donors,” says LJI Research Assistant Professor Benjamin Schmiedel, Ph.D., who worked on DICE with Dr. Vijayanand and LJI Research Assistant Professor and Director of Immunogenomics Gregory Seumois, Ph.D. “Over eight months, we accumulated more than 19,000 vials in our nitrogen tanks. Then, over the years, in multiple batches, we isolated about 30 different kinds of immune cells from each of these donors.”

The DICE team then used high-throughput sequencing tools to look at gene expression in immune cells from each individual donor. This told them what kinds of genes are expressed in different cells and which proteins the cells are making. At last, the researchers could see how different immune cells functioned and spot striking differences in cells from donors with different genetic backgrounds and of different sexes.

Today, scientists around the world can search the DICE dataset to figure out how small genetic variations, called polymorphisms, affect how certain immune cells do their jobs. Researchers have used DICE to investigate gene expression and the function of immune cells connected to Alzheimer’s disease, asthma, inflammatory bowel disease, and many other diseases.

In 2021, Dr. Vijayanand, Dr. Schmiedel, and their colleagues turned to DICE to better understand why some people develop more severe cases of COVID-19. Their research revealed polymorphisms that may change how immune cells use important signaling pathways to sense infections and transmit danger signals—which could help explain why some people fail to mount an immune response to control SARS-CoV-2 infection.

“Building DICE was hard work, but now we can easily look at the data to identify interesting genetic associations between immune cell function and disease risk—for any disease of interest,” says Dr. Schmiedel. “And we find new things everywhere we look.”

From data to discovery

As Dr. Grifoni worked to shed light on the Pirola variant, she and Dr. Sette came to an encouraging conclusion. Her bioinformatics approach suggested that T cells could see right through Pirola’s mutations and find their targets. “It appears previous exposure to Omicron—or vaccination with the newer bivalent vaccines—may arm a person with new T cells that can ‘catch up’ and generate responses that can also recognize Pirola or new upcoming variants,” says Dr. Grifoni.

Dr. Grifoni was eager to see if this prediction was supported by data from actual patients. In fact, she and Dr. Sette had established several overseas collaborations with scientists studying whether SARS-CoV-2 infections or vaccinations could prompt T cells to recognize novel SARS-CoV-2 variants.

It didn’t take long for real-world data to start coming in. By January 2024, two different research groups (one based in South Africa and one based in Sweden) published strong evidence that T cells induced by previous vaccination and infection could indeed cross-recognize the Pirola variant.

The LJI team had been on the right track. Their prediction, based on a deep understanding of immunology and data science, had been corroborated.

LJI scientists set the standard for how to organize and analyze immune system data, making it possible for researchers such as Dr. Grifoni to ask tough questions about human health. Every day, LJI scientists add valuable findings to these databases—and they continue to take on new projects that strengthen data science and collaboration in immunology.

In 2023, Dr. Peters was named co-director of the Human Immunology Project Consortium (HIPC) Data Coordinating Center. The HIPC project connects experts in immunoprofiling—the effort to capture complex immune system data to better understand a person’s disease risk or predict their reaction to a particular drug treatment.

LJI scientists have transformed disparate data points into something revolutionary. Their work powers immunology—and the world is taking note.

Last fall, IEDB users met virtually for their annual workshop. The occasion marked 20 years since the IEDB had launched, and Dr. Sette reflected on how researchers have come to rely on the IEDB. “We read scientific manuscripts and go online to query findings in the IEDB—almost like Googling,” said Dr. Sette.

Origami pieces by Christine Ott. Images by LJI Creative Producer Matt Ellenbogen.