Single-cell and paired chain data: scTCRseq and scBCRseq exploration

Executive Summary

This is a vignette dedicated to provide an overview on how to work with single-cell paired chain data in immunarch

Single-cell support is currently in the development version. In order to access it, you need to install the latest development version of the package by executing the following command:

install.packages("devtools"); devtools::install_github("immunomind/immunarch", ref="dev")

To read paired chain data with immunarch use the repLoad function with .mode = "paired". Currently we support 10X Genomics only.

To create subset immune repertoires with specific barcodes use the select_barcodes function. Output of Seurat::Idents() as a barcode vector works.

To create cluster-specific and patient-specific datasets using barcodes from the output of Seurat::Idents() use the select_clusters function.

Use the data packaged with `immunarch`

Load the package into the R enviroment:

library(immunarch)

For testing purposes we attached a new paired chain dataset to immunarch. Load it by executing the following command:

data(scdata)

## Warning in data(scdata): data set 'scdata' not found

Load the paired chain data

To load your own datasets, use the repLoad function. Currently we implemented paired chain data support for 10X Genomics data only. A working example of loading datasets into R:

file_path <- paste0(system.file(package = "immunarch"), "/extdata/sc/flu.csv.gz")
igdata <- repLoad(file_path, .mode = "paired")

## 
## == Step 1/3: loading repertoire files... ==

## Processing "<initial>" ...

##   -- [1/1] Parsing "/home/runner/work/_temp/Library/immunarch/extdata/sc/flu.csv.gz" -- 10x (filt.contigs)
## 
## == Step 2/3: checking metadata files and merging files... ==
## 
## Processing "<initial>" ...
##   -- Metadata file not found; creating a dummy metadata...
## 
## == Step 3/3: processing paired chain data... ==
## 
## Done!

igdata$meta

## # A tibble: 1 × 1
##   Sample
##   <chr> 
## 1 flu

head(igdata$data[[1]][c(1:7, 16, 17)])

##   Clones Proportion
## 1      3      3e-04
## 2      3      3e-04
## 3      2      2e-04
## 4      2      2e-04
## 5      2      2e-04
## 6      2      2e-04
##                                                                                                               CDR3.nt
## 1                                           TGTGCGAGGCTATGGGGTTGGGGATTACTCTACTGG;TGCACCTCATATGCAGGCAGCAACAATTTGGTATTC
## 2                TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
## 3                                        TGTGCGAGAGCTACCTCTTTTTATTACTTTCACTACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 4                                        TGTGCGAGAGCTACGTCTTTTTATTACTTTCACCACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 5 TGTGCGAGACAAAAGCGAGGGAGTATTACTATGGTTCGGGGAGTTATTATAACACGTCCCTACTTTGACTACTGG;TGCAGCTCATATACAAGCAGCAGCACCCTTTATGTCTTC
## 6                               TGTGCGAGGACTCTGCAACTGGGGATGCTGAGCGCTTTTGATATCTGG;TGCAGCTCATATACAAGCAGCAGCACTTATGTCTTC
##                                   CDR3.aa            V.name            D.name
## 1               CARLWGWGLLYW;CTSYAGSNNLVF IGHV4-59;IGLV2-11     IGHD3-10;None
## 2      CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF   IGHV2-5;IGKV1-5      IGHD2-8;None
## 3              CARATSFYYFHYW;CTSYTTRTTLIF  IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 4              CARATSFYYFHHW;CTSYTTRTTLIF  IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 5 CARQKRGSITMVRGVIITRPYFDYW;CSSYTSSSTLYVF IGHV4-34;IGLV2-14     IGHD3-10;None
## 6           CARTLQLGMLSAFDIW;CSSYTSSSTYVF IGHV5-51;IGLV2-14     IGHD7-27;None
##        J.name   chain                                                  Barcode
## 1 IGHJ4;IGLJ2 IGH;IGL AGAGCGACACCTTGTC-1;ATTGGTGAGACCTAGG-1;TCTTCGGAGGTGATTA-1
## 2 IGHJ4;IGKJ1 IGH;IGK AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
## 3 IGHJ4;IGLJ2 IGH;IGL                    AGACGTTGTACACCGC-1;CAAGTTGCACGGCCAT-1
## 4 IGHJ4;IGLJ2 IGH;IGL                    ATAACGCTCGCATGAT-1;GACTAACGTCCAGTGC-1
## 5 IGHJ4;IGLJ1 IGH;IGL                    ACTGAACCAGTATGCT-1;GGGAGATCAGTATGCT-1
## 6 IGHJ3;IGLJ1 IGH;IGL                    GCGACCACACGGTTTA-1;GTCATTTCAAGCGATG-1

Subset by barcodes

To subset the data by barcodes, use the select_barcodes function.

barcodes <- c("AGTAGTCAGTGTACTC-1", "GGCGACTGTACCGAGA-1", "TTGAACGGTCACCTAA-1")

new_df <- select_barcodes(scdata$data[[1]], barcodes)

new_df

Patient-specific datasets

To create a new dataset with cluster-specific immune repertoires, use the select_clusters function:

scdata_pat <- select_clusters(scdata, scdata$bc_patient, "Patient")

names(scdata_pat$data)

scdata_pat$meta

Cluster-specific datasets

To create a new dataset with cluster-specific immune repertoires, use the select_clusters function. You can apply this function after you created patient-specific datasets to get patient-specific cell cluster-specific immune repertoires, e.g., a Memory B Cell repertoire for a specific patient:

scdata_cl <- select_clusters(scdata_pat, scdata$bc_cluster, "Cluster")

names(scdata_cl$data)

scdata_cl$meta

Explore and compute statistics

Most functions will work out-of-the-box with paired chain data.

p1 <- repOverlap(scdata_cl$data) %>% vis()
p2 <- repDiversity(scdata_cl$data) %>% vis()

target <- c("CARAGYLRGFDYW;CQQYGSSPLTF", "CARATSFYYFHHW;CTSYTTRTTLIF", "CARDLSRGDYFPYFSYHMNVW;CQSDDTANHVIF", "CARGFDTNAFDIW;CTAWDDSLSGVVF", "CTREDYW;CMQTIQLRTF")
p3 <- trackClonotypes(scdata_cl$data, target, .col = "aa") %>% vis()

(p1 + p2) / p3

Several functions may work incorrectly with paired chain data in this release of immunarch. Let us know via GitHub Issues!

ImmunoMind – improving design of T-cell therapies using multi-omics and AI. Research and biopharma partnerships, more details: immunomind.io

support@immunomind.io

Executive Summary

Use the data packaged with `immunarch`

Load the paired chain data

Subset by barcodes

Patient-specific datasets

Cluster-specific datasets

Explore and compute statistics

Single-cell and paired chain data: scTCRseq and scBCRseq exploration

ImmunoMind – improving design of T-cell therapies using multi-omics and AI. Research and biopharma partnerships, more details: immunomind.io

support@immunomind.io

Executive Summary

Use the data packaged with immunarch

Load the paired chain data

Subset by barcodes

Patient-specific datasets

Cluster-specific datasets

Explore and compute statistics

Use the data packaged with `immunarch`