Executive Summary

This is a vignette dedicated to provide an overview on how to work with single-cell paired chain data in immunarch

Single-cell support is currently in the development version. In order to access it, you need to install the latest development version of the package by executing the following command:

install.packages("devtools"); devtools::install_github("immunomind/immunarch", ref="dev")

To read paired chain data with immunarch use the repLoad function with .mode = "paired". Currently we support 10X Genomics only.

To create subset immune repertoires with specific barcodes use the select_barcodes function. Output of Seurat::Idents() as a barcode vector works.

To create cluster-specific and patient-specific datasets using barcodes from the output of Seurat::Idents() use the select_clusters function.

Use the data packaged with immunarch

Load the package into the R enviroment:

For testing purposes we attached a new paired chain dataset to immunarch. Load it by executing the following command:

data(scdata)

Load the paired chain data

To load your own datasets, use the repLoad function. Currently we implemented paired chain data support for 10X Genomics data only. A working example of loading datasets into R:

file_path <- paste0(system.file(package = "immunarch"), "/extdata/sc/flu.csv.gz")
igdata <- repLoad(file_path, .mode = "paired")
## 
## == Step 1/3: loading repertoire files... ==
## Processing "<initial>" ...
##   -- [1/1] Parsing "/tmp/Rtmpa6GuX7/temp_libpath31cb083a64dc82/immunarch/extdata/sc/flu.csv.gz" -- 10x (filt.contigs)
## 
## == Step 2/3: checking metadata files and merging files... ==
## 
## Processing "<initial>" ...
##   -- Metadata file not found; creating a dummy metadata...
## 
## == Step 3/3: processing paired chain data... ==
## 
## Done!
igdata$meta
## # A tibble: 1 × 1
##   Sample
##   <chr> 
## 1 flu
head(igdata$data[[1]][c(1:7, 16, 17)])
##   Clones Proportion
## 1      3      3e-04
## 2      3      3e-04
## 3      2      2e-04
## 4      2      2e-04
## 5      2      2e-04
## 6      2      2e-04
##                                                                                                               CDR3.nt
## 1                                           TGTGCGAGGCTATGGGGTTGGGGATTACTCTACTGG;TGCACCTCATATGCAGGCAGCAACAATTTGGTATTC
## 2                TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
## 3                                        TGTGCGAGAGCTACCTCTTTTTATTACTTTCACTACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 4                                        TGTGCGAGAGCTACGTCTTTTTATTACTTTCACCACTGG;TGCACCTCATATACAACCAGGACCACTCTGATATTC
## 5 TGTGCGAGACAAAAGCGAGGGAGTATTACTATGGTTCGGGGAGTTATTATAACACGTCCCTACTTTGACTACTGG;TGCAGCTCATATACAAGCAGCAGCACCCTTTATGTCTTC
## 6                               TGTGCGAGGACTCTGCAACTGGGGATGCTGAGCGCTTTTGATATCTGG;TGCAGCTCATATACAAGCAGCAGCACTTATGTCTTC
##                                   CDR3.aa            V.name            D.name
## 1               CARLWGWGLLYW;CTSYAGSNNLVF IGHV4-59;IGLV2-11     IGHD3-10;None
## 2      CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF   IGHV2-5;IGKV1-5      IGHD2-8;None
## 3              CARATSFYYFHYW;CTSYTTRTTLIF  IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 4              CARATSFYYFHHW;CTSYTTRTTLIF  IGHV3-7;IGLV2-14 IGHD2OR15-2B;None
## 5 CARQKRGSITMVRGVIITRPYFDYW;CSSYTSSSTLYVF IGHV4-34;IGLV2-14     IGHD3-10;None
## 6           CARTLQLGMLSAFDIW;CSSYTSSSTYVF IGHV5-51;IGLV2-14     IGHD7-27;None
##        J.name   chain                                                  Barcode
## 1 IGHJ4;IGLJ2 IGH;IGL AGAGCGACACCTTGTC-1;ATTGGTGAGACCTAGG-1;TCTTCGGAGGTGATTA-1
## 2 IGHJ4;IGKJ1 IGH;IGK AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
## 3 IGHJ4;IGLJ2 IGH;IGL                    AGACGTTGTACACCGC-1;CAAGTTGCACGGCCAT-1
## 4 IGHJ4;IGLJ2 IGH;IGL                    ATAACGCTCGCATGAT-1;GACTAACGTCCAGTGC-1
## 5 IGHJ4;IGLJ1 IGH;IGL                    ACTGAACCAGTATGCT-1;GGGAGATCAGTATGCT-1
## 6 IGHJ3;IGLJ1 IGH;IGL                    GCGACCACACGGTTTA-1;GTCATTTCAAGCGATG-1

Subset by barcodes

To subset the data by barcodes, use the select_barcodes function.

barcodes <- c("AGTAGTCAGTGTACTC-1", "GGCGACTGTACCGAGA-1", "TTGAACGGTCACCTAA-1")

new_df <- select_barcodes(scdata$data[[1]], barcodes)

new_df
##                                                                                                CDR3.nt
## 1 TGTGCACACACCACCGAACTCTATTGTACTAATGGTGTATGCTATGGGGGCTACTTTGACTACTGG;TGCCAACAGTATAATAGTTATTCGTGGACGTTC
##                              CDR3.aa          V.name D.name      J.name V.end
## 1 CAHTTELYCTNGVCYGGYFDYW;CQQYNSYSWTF IGHV2-5;IGKV1-5     NA IGHJ4;IGKJ1    NA
##   D.start D.end J.start VJ.ins VD.ins DJ.ins Sequence   chain raw_clonotype_id
## 1      NA    NA      NA     NA     NA     NA       NA IGH;IGK               14
##   ContigID Clones                                                  Barcode
## 1       NA      3 AGTAGTCAGTGTACTC-1;GGCGACTGTACCGAGA-1;TTGAACGGTCACCTAA-1
##   Proportion
## 1          1

Patient-specific datasets

To create a new dataset with cluster-specific immune repertoires, use the select_clusters function:

scdata_pat <- select_clusters(scdata, scdata$bc_patient, "Patient")

names(scdata_pat$data)
## [1] "flu_PatientA" "flu_PatientB" "flu_PatientC"
scdata_pat$meta
## # A tibble: 3 × 3
##   Sample       Patient.source Patient 
##   <chr>        <chr>          <chr>   
## 1 flu_PatientA flu            PatientA
## 2 flu_PatientB flu            PatientB
## 3 flu_PatientC flu            PatientC

Cluster-specific datasets

To create a new dataset with cluster-specific immune repertoires, use the select_clusters function. You can apply this function after you created patient-specific datasets to get patient-specific cell cluster-specific immune repertoires, e.g., a Memory B Cell repertoire for a specific patient:

scdata_cl <- select_clusters(scdata_pat, scdata$bc_cluster, "Cluster")

names(scdata_cl$data)
## [1] "flu_PatientA_Activ"  "flu_PatientA_Memory" "flu_PatientA_Naive" 
## [4] "flu_PatientB_Activ"  "flu_PatientB_Memory" "flu_PatientB_Naive" 
## [7] "flu_PatientC_Activ"  "flu_PatientC_Memory" "flu_PatientC_Naive"
scdata_cl$meta
## # A tibble: 9 × 5
##   Sample              Patient.source Patient  Cluster.source Cluster
##   <chr>               <chr>          <chr>    <chr>          <chr>  
## 1 flu_PatientA_Activ  flu            PatientA flu_PatientA   Activ  
## 2 flu_PatientA_Memory flu            PatientA flu_PatientA   Memory 
## 3 flu_PatientA_Naive  flu            PatientA flu_PatientA   Naive  
## 4 flu_PatientB_Activ  flu            PatientB flu_PatientB   Activ  
## 5 flu_PatientB_Memory flu            PatientB flu_PatientB   Memory 
## 6 flu_PatientB_Naive  flu            PatientB flu_PatientB   Naive  
## 7 flu_PatientC_Activ  flu            PatientC flu_PatientC   Activ  
## 8 flu_PatientC_Memory flu            PatientC flu_PatientC   Memory 
## 9 flu_PatientC_Naive  flu            PatientC flu_PatientC   Naive

Explore and compute statistics

Most functions will work out-of-the-box with paired chain data.

p1 <- repOverlap(scdata_cl$data) %>% vis()
p2 <- repDiversity(scdata_cl$data) %>% vis()

target <- c("CARAGYLRGFDYW;CQQYGSSPLTF", "CARATSFYYFHHW;CTSYTTRTTLIF", "CARDLSRGDYFPYFSYHMNVW;CQSDDTANHVIF", "CARGFDTNAFDIW;CTAWDDSLSGVVF", "CTREDYW;CMQTIQLRTF")
p3 <- trackClonotypes(scdata_cl$data, target, .col = "aa") %>% vis()
## Warning in melt.data.table(.data): id.vars and measure.vars are internally
## guessed when both are 'NULL'. All non-numeric/integer/logical type columns are
## considered id.vars, which in this case are columns [CDR3.aa, ...]. Consider
## providing at least one of 'id' or 'measure' vars in future.
(p1 + p2) / p3
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
##  Please use `spread()` instead.
##  The deprecated feature was likely used in the ggalluvial package.
##   Please report the issue at
##   <https://github.com/corybrunson/ggalluvial/issues>.
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
##  The deprecated feature was likely used in the dplyr package.
##   Please report the issue at <https://github.com/tidyverse/dplyr/issues>.

Several functions may work incorrectly with paired chain data in this release of immunarch. Let us know via GitHub Issues!