More Data, and What’s Coming Next


When we expanded our partnership with Microsoft to address the challenge of COVID-19, we committed to providing open access to the resulting data. A couple of weeks ago we started to make good on that promise by releasing pairs of antigens and T-cell receptors (TCRs) for 17 experiments targeting a broad swath of the virus genome. Today we’re expanding that dataset to include more experiments—a lot more. 

The dataset now includes 70 experiments and more than 90,000 TCR-antigen pairs. All of the new data targets the smaller peptide regions we discussed in our last post. As the data set increases, we’re seeing confirmation of initial hotspots (e.g., the standout cluster around index 27,800 in ORF7b) and getting a more accurate view of more nuanced regions.

What’s next? Over the next few weeks, we will continue to make incremental updates to this dataset. Towards the end of July we will release a significant new update, including not just MIRA data but also what we refer to as “repertoires”—full immunoSEQ TCRB assessments for hundreds of people together with metadata about their exposure to the SARS-CoV-2 virus. This will enable us to move beyond the nature of the virus itself, towards an understanding of the private and shared immune response that it triggers.

We’ve spent years developing technologies to help understand diseases like COVID-19. Keep checking back as we share the growing results of that work with researchers and innovators around the world.

Download the June 25, 2020 dataset here.