vignettes/web_only/load_10x.Rmd
load_10x.Rmd
10x Genomics has various pipelines for single cell and spatial views of biological systems, including single cell immune profiling. The 10x Genomics Chromium Single Cell Immune Profiling Solution enables simultaneous analysis of the following:
Their end-to-end pipeline also includes the Cell Ranger software, which include the following pipelines for Immune profiling analysis:
cellranger mkfastq
demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina’s bcl2fastq, with additional useful features that are specific to 10x libraries and a simplified sample sheet format.
cellranger vdj
takes FASTQ files from cellranger mkfastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.
cellranger count
takes FASTQ files for 5’ Gene Expression and/or Feature Barcode (cell surface protein or antigen) libraries and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger count pipeline outputs a .cloupe file which can be loaded into Loupe Browser for interactive visualization, clustering, and differential expression analysis.
Follow the instructions here to use Cellranger on your data. Note that Cellranger is currently only supported by Linux operating systems. The cellranger count
and cellranger vdj
methods will be particularly useful.
You can find sample data of full runs to download here. In this tutorial, we use this single cell mouse data. If you are using immunarch
you can download only the .csv files.
Upon processing the data, you will have a lot of output files. You should use the filtered contigs
csv files because they contain barcode information.
.- This contains the count data we want!
├── vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv <-
├── vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
├── vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_matrix.h5
├── vdj_v1_mm_c57bl6_pbmc_t_bam.bam.bai
├── vdj_v1_mm_c57bl6_pbmc_t_molecule_info.h5
├── vdj_v1_mm_c57bl6_pbmc_t_raw_feature_bc_matrix.tar.gz ├── vdj_v1_mm_c57bl6_pbmc_t_analysis.tar.gz
Run the code below in your R environment to load the data into Immunarch’s format. You can run it on the entire folder with the Cellranger output files. repLoad
will ignore the file formats that are unsupported.
# 1.1) Load the package into R:
> library(immunarch)
# 1.2) Replace with the path to your processed 10x data or to the clonotypes file
> file_path = "~/path/to/your/cellranger/data/"
# 1.3) Load 10x data with repLoad
> immdata_10x <- repLoad(file_path)
== Step 1/3: loading repertoire files... ==
"/filepath/C57BL_mice_igenrichment" ...
Processing -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.
[-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv" -- unsupported format, skipping
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv" -- 10x (consensus)
-- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv" -- 10x (filt.contigs)
!] Removed 1198 clonotypes with no nucleotide and amino acid CDR3 sequence.
[
== Step 2/3: checking metadata files and merging... ==
"<initial>" ...
Processing -- Metadata file not found; creating a dummy metadata...
== Step 3/3: splitting data by barcodes and chains... ==
! Done
Now let’s take a look at the data! Your output should look something like below.
> immdata_10x
$data$vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA
# A tibble: 710 x 17
Clones Proportion CDR3.nt CDR3.aa V.name D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins chain ClonotypeID ConsensusID<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <chr> <chr> <chr>
1 55 0.00414 TGTGCTATGGC… CAMATGG… TRAV13… None TRAJ56 NA NA NA NA NA NA NA TRA clonotype306 clonotype30…
2 55 0.00414 TGTGCAGCTAG… CAASGNT… TRAV7-4 None TRAJ27 NA NA NA NA NA NA NA TRA clonotype338 clonotype33…
3 53 0.00399 TGTGCAGCAAG… CAARDSG… TRAV14… None TRAJ11 NA NA NA NA NA NA NA TRA clonotype617 clonotype61…
4 45 0.00339 TGCGCAGTCAG… CAVSNNT… TRAV3-3 None TRAJ27 NA NA NA NA NA NA NA TRA clonotype435 clonotype43…
5 43 0.00324 TGTGCAGTCAG… CAVSNMG… TRAV7D… None TRAJ9 NA NA NA NA NA NA NA TRA clonotype401 clonotype40…
6 42 0.00316 TGTGCAGCAAG… CAASPNY… TRAV14… None TRAJ21 NA NA NA NA NA NA NA TRA clonotype5 clonotype5_…
7 37 0.00279 TGTGCAGTGAG… CAVSSGG… TRAV7D… None TRAJ6 NA NA NA NA NA NA NA TRA clonotype453 clonotype45…
8 35 0.00264 TGTGCAGCAAG… CAASATS… TRAV14… None TRAJ22 NA NA NA NA NA NA NA TRA clonotype809 clonotype80…
9 32 0.00241 TGTGCAGCAAG… CAASPNY… TRAV14… None TRAJ21 NA NA NA NA NA NA NA TRA clonotype150 clonotype15…
10 32 0.00241 TGTGCTCTGGG… CALGDEA… TRAV6-… None TRAJ30 NA NA NA NA NA NA NA TRA clonotype393 clonotype39…
# … with 700 more rows
$meta
Sample Chain Source1 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_Multi Multi vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
2 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRA TRA vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
3 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRB TRB vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
5 vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA TRA vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
6 vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRB TRB vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
Congrats! Now your data is ready for exploration. Follow the steps here to learn more about how to explore your dataset.
Another important note is that some of the contigs files lack a column for barcodes – a unique identifier of any cell.
These files can be useful for analysis of single chain data (only alpha, or beta TCRs), but in order to analyze paired-chain data and fully utilize the full power of single-cell technologies, you should upload the file with barcodes to the Immunarch.
Do you have single-cell data? Immunarch can now parse both single-chain and paired-chain single-cell data. Try out these features here.