vignettes/web_only/v21_singlecell.Rmd
v21_singlecell.Rmd
This is a vignette dedicated to provide an overview on how to work with single-cell paired chain data in
immunarch
Single-cell support is currently in the development version. In order to access it, you need to install the latest development version of the package by executing the following command:
install.packages("devtools"); devtools::install_github("immunomind/immunarch", ref="dev")
To read paired chain data with
immunarch
use therepLoad
function with.mode = "paired"
. Currently we support 10X Genomics only.
To create subset immune repertoires with specific barcodes use the
select_barcodes
function. Output ofSeurat::Idents()
as a barcode vector works.
To create cluster-specific and patient-specific datasets using barcodes from the output of
Seurat::Idents()
use theselect_clusters
function.
immunarch
Load the package into the R enviroment:
For testing purposes we attached a new paired chain dataset to immunarch
. Load it by executing the following command:
data(scdata)
To load your own datasets, use the repLoad
function. Currently we implemented paired chain data support for 10X Genomics data only. A working example of loading datasets into R:
file_path <- paste0(system.file(package = "immunarch"), "/extdata/sc/flu.csv.gz")
igdata <- repLoad(file_path, .mode = "paired")
##
## == Step 1/3: loading repertoire files... ==
## Processing "<initial>" ...
## -- [1/1] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/sc/flu.csv.gz" -- 10x (filt.contigs)
##
## == Step 2/3: checking metadata files and merging files... ==
##
## Processing "<initial>" ...
## -- Metadata file not found; creating a dummy metadata...
##
## == Step 3/3: processing paired chain data... ==
##
## Done!
igdata$meta
## # A tibble: 1 × 1
## Sample
## <chr>
## 1 flu
## Clones Proportion
## 1 3 3e-04
## 2 3 3e-04
## 3 2 2e-04
## 4 2 2e-04
## 5 2 2e-04
## 6 2 2e-04
## CDR3.nt
## 1 TGTGCGAGGCTATGGGGTTGGGGATTACTCTACTGG;TGCACCTCATATGCAGGCAGCAACAATTTGGTATTC
## 2 TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
## 3 TGTGCGAGAGCTACCTCTTTTTATTACTTTCACTACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 4 TGTGCGAGAGCTACGTCTTTTTATTACTTTCACCACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 5 TGTGCGAGACAAAAGCGAGGGAGTATTACTATGGTTCGGGGAGTTATTATAACACGTCCCTACTTTGACTACTGG;TGCAGCTCATATACAAGCAGCAGCACCCTTTATGTCTTC
## 6 TGTGCGAGGACTCTGCAACTGGGGATGCTGAGCGCTTTTGATATCTGG;TGCAGCTCATATACAAGCAGCAGCACTTATGTCTTC
## CDR3.aa V.name D.name
## 1 CARLWGWGLLYW;CTSYAGSNNLVF IGHV4-59;IGLV2-11 IGHD3-10;None
## 2 CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF IGHV2-5;IGKV1-5 IGHD2-8;None
## 3 CARATSFYYFHYW;CTSYTTRTTLIF IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 4 CARATSFYYFHHW;CTSYTTRTTLIF IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 5 CARQKRGSITMVRGVIITRPYFDYW;CSSYTSSSTLYVF IGHV4-34;IGLV2-14 IGHD3-10;None
## 6 CARTLQLGMLSAFDIW;CSSYTSSSTYVF IGHV5-51;IGLV2-14 IGHD7-27;None
## J.name chain Barcode
## 1 IGHJ4;IGLJ2 IGH;IGL AGAGCGACACCTTGTC-1;ATTGGTGAGACCTAGG-1;TCTTCGGAGGTGATTA-1
## 2 IGHJ4;IGKJ1 IGH;IGK AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
## 3 IGHJ4;IGLJ2 IGH;IGL AGACGTTGTACACCGC-1;CAAGTTGCACGGCCAT-1
## 4 IGHJ4;IGLJ2 IGH;IGL ATAACGCTCGCATGAT-1;GACTAACGTCCAGTGC-1
## 5 IGHJ4;IGLJ1 IGH;IGL ACTGAACCAGTATGCT-1;GGGAGATCAGTATGCT-1
## 6 IGHJ3;IGLJ1 IGH;IGL GCGACCACACGGTTTA-1;GTCATTTCAAGCGATG-1
To subset the data by barcodes, use the select_barcodes
function.
barcodes <- c("AGTAGTCAGTGTACTC-1", "GGCGACTGTACCGAGA-1", "TTGAACGGTCACCTAA-1")
new_df <- select_barcodes(scdata$data[[1]], barcodes)
new_df
## CDR3.nt
## 1 TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
## CDR3.aa V.name D.name J.name V.end
## 1 CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF IGHV2-5;IGKV1-5 NA IGHJ4;IGKJ1 NA
## D.start D.end J.start VJ.ins VD.ins DJ.ins Sequence chain raw_clonotype_id
## 1 NA NA NA NA NA NA NA IGH;IGK 14
## ContigID Clones Barcode
## 1 NA 3 AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
## Proportion
## 1 1
To create a new dataset with cluster-specific immune repertoires, use the select_clusters
function:
scdata_pat <- select_clusters(scdata, scdata$bc_patient, "Patient")
names(scdata_pat$data)
## [1] "flu_PatientA" "flu_PatientB" "flu_PatientC"
scdata_pat$meta
## # A tibble: 3 × 3
## Sample Patient.source Patient
## <chr> <chr> <chr>
## 1 flu_PatientA flu PatientA
## 2 flu_PatientB flu PatientB
## 3 flu_PatientC flu PatientC
To create a new dataset with cluster-specific immune repertoires, use the select_clusters
function. You can apply this function after you created patient-specific datasets to get patient-specific cell cluster-specific immune repertoires, e.g., a Memory B Cell repertoire for a specific patient:
scdata_cl <- select_clusters(scdata_pat, scdata$bc_cluster, "Cluster")
names(scdata_cl$data)
## [1] "flu_PatientA_Activ" "flu_PatientA_Memory" "flu_PatientA_Naive"
## [4] "flu_PatientB_Activ" "flu_PatientB_Memory" "flu_PatientB_Naive"
## [7] "flu_PatientC_Activ" "flu_PatientC_Memory" "flu_PatientC_Naive"
scdata_cl$meta
## # A tibble: 9 × 5
## Sample Patient.source Patient Cluster.source Cluster
## <chr> <chr> <chr> <chr> <chr>
## 1 flu_PatientA_Activ flu PatientA flu_PatientA Activ
## 2 flu_PatientA_Memory flu PatientA flu_PatientA Memory
## 3 flu_PatientA_Naive flu PatientA flu_PatientA Naive
## 4 flu_PatientB_Activ flu PatientB flu_PatientB Activ
## 5 flu_PatientB_Memory flu PatientB flu_PatientB Memory
## 6 flu_PatientB_Naive flu PatientB flu_PatientB Naive
## 7 flu_PatientC_Activ flu PatientC flu_PatientC Activ
## 8 flu_PatientC_Memory flu PatientC flu_PatientC Memory
## 9 flu_PatientC_Naive flu PatientC flu_PatientC Naive
Most functions will work out-of-the-box with paired chain data.
p1 <- repOverlap(scdata_cl$data) %>% vis()
p2 <- repDiversity(scdata_cl$data) %>% vis()
target <- c("CARAGYLRGFDYW;CQQYGSSPLTF", "CARATSFYYFHHW;CTSYTTRTTLIF", "CARDLSRGDYFPYFSYHMNVW;CQSDDTANHVIF", "CARGFDTNAFDIW;CTAWDDSLSGVVF", "CTREDYW;CMQTIQLRTF")
p3 <- trackClonotypes(scdata_cl$data, target, .col = "aa") %>% vis()
## Warning in melt.data.table(.data): id.vars and measure.vars are internally
## guessed when both are 'NULL'. All non-numeric/integer/logical type columns are
## considered id.vars, which in this case are columns [CDR3.aa, ...]. Consider
## providing at least one of 'id' or 'measure' vars in future.
(p1 + p2) / p3
Several functions may work incorrectly with paired chain data in this release of immunarch
. Let us know via GitHub Issues!