Loading 10x Genomics Data

Intro to 10x Genomics

10x Genomics has various pipelines for single cell and spatial views of biological systems, including single cell immune profiling. The 10x Genomics Chromium Single Cell Immune Profiling Solution enables simultaneous analysis of the following:

V(D)J transcripts and clonotypes for T and B cells.
5’ gene expression.
Cell surface proteins/antigen specificity (feature barcodes) at single-cell resolution for the same set of cells.

Their end-to-end pipeline also includes the Cell Ranger software, which include the following pipelines for Immune profiling analysis:

cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina’s bcl2fastq, with additional useful features that are specific to 10x libraries and a simplified sample sheet format.

cellranger vdj takes FASTQ files from cellranger mkfastq for V(D)J libraries and performs sequence assembly and paired clonotype calling. It uses the Chromium cellular barcodes and UMIs to assemble V(D)J transcripts per cell. Clonotypes and CDR3 sequences are output as a .vloupe file which can be loaded into Loupe V(D)J Browser.

cellranger count takes FASTQ files for 5’ Gene Expression and/or Feature Barcode (cell surface protein or antigen) libraries and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger count pipeline outputs a .cloupe file which can be loaded into Loupe Browser for interactive visualization, clustering, and differential expression analysis.

Cell Ranger Data Pipeline

Follow the instructions here to use Cellranger on your data. Note that Cellranger is currently only supported by Linux operating systems. The cellranger count and cellranger vdj methods will be particularly useful.

You can find sample data of full runs to download here. In this tutorial, we use this single cell mouse data. If you are using immunarch you can download only the .csv files.

Prepare 10x Data

Upon processing the data, you will have a lot of output files. You should use the filtered contigs csv files because they contain barcode information.

.
├── vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv <-- This contains the count data we want!
├── vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
├── vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv
├── vdj_v1_mm_c57bl6_pbmc_t_matrix.h5
├── vdj_v1_mm_c57bl6_pbmc_t_bam.bam.bai
├── vdj_v1_mm_c57bl6_pbmc_t_molecule_info.h5
├── vdj_v1_mm_c57bl6_pbmc_t_raw_feature_bc_matrix.tar.gz
├── vdj_v1_mm_c57bl6_pbmc_t_analysis.tar.gz

Load into Immunarch

Run the code below in your R environment to load the data into Immunarch’s format. You can run it on the entire folder with the Cellranger output files. repLoad will ignore the file formats that are unsupported.

# 1.1) Load the package into R:
> library(immunarch)

# 1.2) Replace with the path to your processed 10x data or to the clonotypes file
> file_path = "~/path/to/your/cellranger/data/"

# 1.3) Load 10x data with repLoad
> immdata_10x <- repLoad(file_path)

== Step 1/3: loading repertoire files... ==

Processing "/filepath/C57BL_mice_igenrichment" ...
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv" -- unsupported format, skipping
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv" -- 10x (consensus)
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 1198 clonotypes with no nucleotide and amino acid CDR3 sequence.

== Step 2/3: checking metadata files and merging... ==

Processing "<initial>" ...
  -- Metadata file not found; creating a dummy metadata...

== Step 3/3: splitting data by barcodes and chains... ==

Done!

Now let’s take a look at the data! Your output should look something like below.

> immdata_10x
$data$vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA
# A tibble: 710 x 17
   Clones Proportion CDR3.nt      CDR3.aa  V.name  D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins chain ClonotypeID  ConsensusID
    <dbl>      <dbl> <chr>        <chr>    <chr>   <chr>  <chr>  <int>   <int> <int>   <int>  <int>  <int>  <int> <chr> <chr>        <chr>
 1     55    0.00414 TGTGCTATGGC… CAMATGG… TRAV13… None   TRAJ56    NA      NA    NA      NA     NA     NA     NA TRA   clonotype306 clonotype30…
 2     55    0.00414 TGTGCAGCTAG… CAASGNT… TRAV7-4 None   TRAJ27    NA      NA    NA      NA     NA     NA     NA TRA   clonotype338 clonotype33…
 3     53    0.00399 TGTGCAGCAAG… CAARDSG… TRAV14… None   TRAJ11    NA      NA    NA      NA     NA     NA     NA TRA   clonotype617 clonotype61…
 4     45    0.00339 TGCGCAGTCAG… CAVSNNT… TRAV3-3 None   TRAJ27    NA      NA    NA      NA     NA     NA     NA TRA   clonotype435 clonotype43…
 5     43    0.00324 TGTGCAGTCAG… CAVSNMG… TRAV7D… None   TRAJ9     NA      NA    NA      NA     NA     NA     NA TRA   clonotype401 clonotype40…
 6     42    0.00316 TGTGCAGCAAG… CAASPNY… TRAV14… None   TRAJ21    NA      NA    NA      NA     NA     NA     NA TRA   clonotype5   clonotype5_…
 7     37    0.00279 TGTGCAGTGAG… CAVSSGG… TRAV7D… None   TRAJ6     NA      NA    NA      NA     NA     NA     NA TRA   clonotype453 clonotype45…
 8     35    0.00264 TGTGCAGCAAG… CAASATS… TRAV14… None   TRAJ22    NA      NA    NA      NA     NA     NA     NA TRA   clonotype809 clonotype80…
 9     32    0.00241 TGTGCAGCAAG… CAASPNY… TRAV14… None   TRAJ21    NA      NA    NA      NA     NA     NA     NA TRA   clonotype150 clonotype15…
10     32    0.00241 TGTGCTCTGGG… CALGDEA… TRAV6-… None   TRAJ30    NA      NA    NA      NA     NA     NA     NA TRA   clonotype393 clonotype39…
# … with 700 more rows

$meta
                                                       Sample Chain                                                Source
1 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_Multi Multi vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
2   vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRA   TRA vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
3   vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRB   TRB vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
5    vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA   TRA  vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
6    vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRB   TRB  vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations

Congrats! Now your data is ready for exploration. Follow the steps here to learn more about how to explore your dataset.

Note on barcodes

Another important note is that some of the contigs files lack a column for barcodes – a unique identifier of any cell.

These files can be useful for analysis of single chain data (only alpha, or beta TCRs), but in order to analyze paired-chain data and fully utilize the full power of single-cell technologies, you should upload the file with barcodes to the Immunarch.

Paired-chain data

Do you have single-cell data? Immunarch can now parse both single-chain and paired-chain single-cell data. Try out these features here.

ImmunoMind – improving design of T-cell therapies using multi-omics and AI. Research and biopharma partnerships, more details: immunomind.io