Step One: Understand SARS-CoV-2

Adaptive COVID-19 ImmuneResponse Animation Still

T cells recognize virus-infected cells by binding to epitopes – small protein fragments (peptide antigens) cut from the expressed viral genome – that are presented on the cell surface. The first step in identifying the best T cells for this job is to understand which epitopes are selected for presentation. The presentation process is not random, and there are some computer models to predict likely epitopes, but their success varies widely. The next step is understanding which of these epitopes is “immunodominant” (i.e., elicits a strong response within or across individuals). This is a much harder problem. Adaptive Biotechnologies is able to address both of these steps by assessing the adaptive immune response and directly linking the response to specific epitopes, effectively letting the immune system tell us what pieces of the virus it is attacking.

Searching the viral genome with MIRA
The MIRA assay (Multiplex Identification of T cell Receptor Antigen Specificity) exposes a population of T cells to a set of targets (short protein sequences or peptides) and identifies the combinations where immune responses occur. This can be done for many targets in parallel. These peptides are cut from the viral proteins, edited to the correct length, and presented by HLA molecules on the surface of most cells in the human body. Adaptive has a funneling strategy that occurs in stages. First, we cut the virus into gene segments, transfecting the corresponding RNA into antigen-presenting cells, and determine which segments code for peptides that are correctly processed, presented and capable of engendering an immune response. We then resolve the exact peptide sequences that might be coming from these gene segments to determine which specific targets the immune system responds to during infection.

Step one in our SARS-CoV-2 project was to take a significant portion of the expressed genome and break it down into “chunks” that we used as MIRA targets. In order to cover a wide swath (about 8,000 nucleotides or about 28% of the total genome), these chunks are much longer than naturally-occurring epitopes – around ~120 nucleotides each which leads to peptide presentation along ~40 amino acid stretches of the virus. By seeing which of these chunks induce T cell responses, particularly which have the strongest response across individuals, we can narrow down the search window for further exploration by immunologists and other scientists.

Step two was to begin looking at peptides; the actual pieces of the virus presented to the immune system. These are smaller than the chunks we used in step one, typically 9-15 amino acids long. It would be prohibitive to synthesize all possible peptides, but using chunk results and other inputs, we have prepared hundreds of them that range across the viral genome. Some of these include regions where other researchers have been looking as well, whether from prior data from related viruses, computational predictions, or both.

Early data and first results
This first data release includes signals seen at both the “chunk” level and at the peptide level, and we can already see some different “hotspots” of immune response from our first experiments. These include hits not only in heavily-studied regions such as the spike protein, but also in less studied areas such as ORF7b. Data from more subjects will help us bring these and other areas into better focus, and we look forward to sharing them over the upcoming weeks.

Our first release of data includes four data files. For each of the two panels (“chunk” and “peptide”) we’ve included a detailed file listing every TCR found to have significant binding behavior, as well as rollups by target that point to specific hotspots. Linkage back to Genbank is provided to help researchers combine the data with other sets.

There are more than 9,000 TCRs in the dataset. For each of these we list both the identified nucleotide sequence, and an annotation that includes the CDR3 region, V and J genes. Multiple unique nucleotide sequences can code for the same annotated value.

We have run the MIRA assay with T cells from healthy subjects (to elicit new immune responses to the antigen targets) and with T cells from individuals who had been infected by COVID-19 and have since recovered (to map their immune response to specific antigen targets). Both sources of data add together to build the picture of potential immune responses. A subject metadata table is provided to associate these details with the experimental results.  

Download the June 10, 2020 full dataset here.


What’s next
As more data are generated, we will gain confidence in these signals, particularly how the immune response to different viral antigens varies across people, and we will share these results regularly. We will also begin to connect these identified hotspots of viral recognition to the immune signals seen in exposed patients from the ImmuneCODE™. We can’t wait to share what we find, and hope that this foundational immunology resource enables researchers to translate this knowledge into useful clinical tools to combat this disease.