Single-cell and paired chain data: scTCRseq and scBCRseq exploration
ImmunoMind – improving design of T-cell therapies using multi-omics and AI. Research and biopharma partnerships, more details: immunomind.io
support@immunomind.io
Source:vignettes/web_only_v0/v21_singlecell.Rmd
v21_singlecell.RmdExecutive Summary
This is a vignette dedicated to provide an overview on how to work with single-cell paired chain data in
immunarch
Single-cell support is currently in the development version. In order to access it, you need to install the latest development version of the package by executing the following command:
install.packages("devtools"); devtools::install_github("immunomind/immunarch", ref="dev")To read paired chain data with
immunarchuse therepLoadfunction with.mode = "paired". Currently we support 10X Genomics only.
To create subset immune repertoires with specific barcodes use the
select_barcodesfunction. Output ofSeurat::Idents()as a barcode vector works.
To create cluster-specific and patient-specific datasets using barcodes from the output of
Seurat::Idents()use theselect_clustersfunction.
Use the data packaged with immunarch
Load the package into the R enviroment:
For testing purposes we attached a new paired chain dataset to
immunarch. Load it by executing the following command:
data(scdata)## Warning in data(scdata): data set 'scdata' not found
Load the paired chain data
To load your own datasets, use the repLoad function.
Currently we implemented paired chain data support for 10X Genomics data
only. A working example of loading datasets into R:
file_path <- paste0(system.file(package = "immunarch"), "/extdata/sc/flu.csv.gz")
igdata <- repLoad(file_path, .mode = "paired")##
## == Step 1/3: loading repertoire files... ==
## Processing "<initial>" ...
## -- [1/1] Parsing "/home/runner/work/_temp/Library/immunarch/extdata/sc/flu.csv.gz" -- 10x (filt.contigs)
##
## == Step 2/3: checking metadata files and merging files... ==
##
## Processing "<initial>" ...
## -- Metadata file not found; creating a dummy metadata...
##
## == Step 3/3: processing paired chain data... ==
##
## Done!
igdata$meta## # A tibble: 1 × 1
## Sample
## <chr>
## 1 flu
## Clones Proportion
## 1 3 3e-04
## 2 3 3e-04
## 3 2 2e-04
## 4 2 2e-04
## 5 2 2e-04
## 6 2 2e-04
## CDR3.nt
## 1 TGTGCGAGGCTATGGGGTTGGGGATTACTCTACTGG;TGCACCTCATATGCAGGCAGCAACAATTTGGTATTC
## 2 TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
## 3 TGTGCGAGAGCTACCTCTTTTTATTACTTTCACTACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 4 TGTGCGAGAGCTACGTCTTTTTATTACTTTCACCACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 5 TGTGCGAGACAAAAGCGAGGGAGTATTACTATGGTTCGGGGAGTTATTATAACACGTCCCTACTTTGACTACTGG;TGCAGCTCATATACAAGCAGCAGCACCCTTTATGTCTTC
## 6 TGTGCGAGGACTCTGCAACTGGGGATGCTGAGCGCTTTTGATATCTGG;TGCAGCTCATATACAAGCAGCAGCACTTATGTCTTC
## CDR3.aa V.name D.name
## 1 CARLWGWGLLYW;CTSYAGSNNLVF IGHV4-59;IGLV2-11 IGHD3-10;None
## 2 CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF IGHV2-5;IGKV1-5 IGHD2-8;None
## 3 CARATSFYYFHYW;CTSYTTRTTLIF IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 4 CARATSFYYFHHW;CTSYTTRTTLIF IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 5 CARQKRGSITMVRGVIITRPYFDYW;CSSYTSSSTLYVF IGHV4-34;IGLV2-14 IGHD3-10;None
## 6 CARTLQLGMLSAFDIW;CSSYTSSSTYVF IGHV5-51;IGLV2-14 IGHD7-27;None
## J.name chain Barcode
## 1 IGHJ4;IGLJ2 IGH;IGL AGAGCGACACCTTGTC-1;ATTGGTGAGACCTAGG-1;TCTTCGGAGGTGATTA-1
## 2 IGHJ4;IGKJ1 IGH;IGK AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
## 3 IGHJ4;IGLJ2 IGH;IGL AGACGTTGTACACCGC-1;CAAGTTGCACGGCCAT-1
## 4 IGHJ4;IGLJ2 IGH;IGL ATAACGCTCGCATGAT-1;GACTAACGTCCAGTGC-1
## 5 IGHJ4;IGLJ1 IGH;IGL ACTGAACCAGTATGCT-1;GGGAGATCAGTATGCT-1
## 6 IGHJ3;IGLJ1 IGH;IGL GCGACCACACGGTTTA-1;GTCATTTCAAGCGATG-1
Subset by barcodes
To subset the data by barcodes, use the select_barcodes
function.
barcodes <- c("AGTAGTCAGTGTACTC-1", "GGCGACTGTACCGAGA-1", "TTGAACGGTCACCTAA-1")
new_df <- select_barcodes(scdata$data[[1]], barcodes)
new_dfPatient-specific datasets
To create a new dataset with cluster-specific immune repertoires, use
the select_clusters function:
scdata_pat <- select_clusters(scdata, scdata$bc_patient, "Patient")
names(scdata_pat$data)
scdata_pat$metaCluster-specific datasets
To create a new dataset with cluster-specific immune repertoires, use
the select_clusters function. You can apply this function
after you created patient-specific datasets to get patient-specific cell
cluster-specific immune repertoires, e.g., a Memory B Cell repertoire
for a specific patient:
scdata_cl <- select_clusters(scdata_pat, scdata$bc_cluster, "Cluster")
names(scdata_cl$data)
scdata_cl$metaExplore and compute statistics
Most functions will work out-of-the-box with paired chain data.
p1 <- repOverlap(scdata_cl$data) %>% vis()
p2 <- repDiversity(scdata_cl$data) %>% vis()
target <- c("CARAGYLRGFDYW;CQQYGSSPLTF", "CARATSFYYFHHW;CTSYTTRTTLIF", "CARDLSRGDYFPYFSYHMNVW;CQSDDTANHVIF", "CARGFDTNAFDIW;CTAWDDSLSGVVF", "CTREDYW;CMQTIQLRTF")
p3 <- trackClonotypes(scdata_cl$data, target, .col = "aa") %>% vis()
(p1 + p2) / p3Several functions may work incorrectly with paired chain data in this
release of immunarch. Let us know via GitHub
Issues!