Load immune repertoire files into the R workspace

The repLoad function loads repertoire files into R workspace in the immunarch format where you can immediately use them for the analysis. repLoad automatically detects the right format for your files, so all you need is simply provide the path to your files.

See "Details" for more information on supported formats. See "Examples" for diving right into it.

repLoad(.path, .mode = "paired", .coding = TRUE, ...)

Arguments

.path

A character string specifying the path to the input data. Input data can be one of the following:

- a single repertoire file. In this case repLoad returns an R data.frame;

- a vector of paths to repertoire files. Same as in the case with no metadata file presented in the next section below;

- a path to the folder with repertoire files and, if available, metadata file "metadata.txt". If the metadata file if presented, then the repLoad returns a list with two elements "data" and "meta". "data" is an another list with repertoire R data.frames. "meta" is a data frame with the metadata. If the metadata file "metadata.txt" is not presented, then the repLoad creates a dummy metadata file with sample names and returns a list with two elements "data" and "meta". If input data has multiple chains or cell types stored in the same file (for example, like in 10xGenomics repertoire files), such repertoire files will be splitted to different R data frames with only one type of chain and cell presented. The metadata file will have additional columns specifying cell and chain types for different samples.

.mode

Either "single" for single chain data or "paired" for paired chain data.

Currently "single" works for every format, and "paired" works only for 10X Genomics data.

By default, 10X Genomics data will be loaded as paired chain data, and other files will be loaded as single chain data.

.coding

A logical value. Set TRUE to get coding-only clonotypes (by defaul). Set FALSE to get all clonotypes.

...

Extra arguments for parsing functions

Value

A list with two named elements:

- "data" is a list of input samples;

- "meta" is a data frame with sample metadata.

Details

The metadata has to be a tab delimited file with first column named "Sample". It can have any number of additional columns with arbitrary names. The first column should contain base names of files without extensions in your folder. Example:

Sample	Sex	Age	Status
immunoseq_1	M	1	C
immunoseq_2	M	2	C
immunoseq_3	FALSE	3	A

Currently, Immunarch support the following formats:

- "immunoseq" - ImmunoSEQ of any version. http://www.adaptivebiotech.com/immunoseq

- "mitcr" - MiTCR. https://github.com/milaboratory/mitcr

- "mixcr" - MiXCR (the "all" files) of any version. https://github.com/milaboratory/mixcr

- "migec" - MiGEC. http://migec.readthedocs.io/en/latest/

- "migmap" - For parsing IgBLAST results postprocessed with MigMap. https://github.com/mikessh/migmap

- "tcr" - tcR, our previous package. https://imminfo.github.io/tcr/

- "vdjtools" - VDJtools of any version. http://vdjtools-doc.readthedocs.io/en/latest/

- "imgt" - IMGT HighV-QUEST. http://www.imgt.org/HighV-QUEST/

- "airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html

- "10x" - 10XGenomics clonotype annotations tables. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/output/annotation

- "archer" - ArcherDX clonotype tables. https://archerdx.com/

Examples

# To load the data from a single file (note that you don't need to specify the data format):
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/Sample1.tsv.gz")
immdata <- repLoad(file_path)
#> 
#> == Step 1/3: loading repertoire files... ==
#> Processing "<initial>" ...
#>   -- [1/1] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/Sample1.tsv.gz" -- 
#> immunarch
#> 
#> == Step 2/3: checking metadata files and merging files... ==
#> Processing "<initial>" ...
#>   -- Metadata file not found; creating a dummy metadata...
#> 
#> == Step 3/3: processing paired chain data... ==
#> Done!

# Suppose you have a following structure in your folder:
# >_ ls
# immunoseq1.txt
# immunoseq2.txt
# immunoseq3.txt
# metadata.txt

# To load the whole folder with every file in it type:
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/")
immdata <- repLoad(file_path)
#> 
#> == Step 1/3: loading repertoire files... ==
#> Processing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/" ...
#>   -- [1/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample1.tsv.gz" -- 
#> immunarch
#>   -- [2/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample2.tsv.gz" -- 
#> immunarch
#>   -- [3/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample3.tsv.gz" -- 
#> immunarch
#>   -- [4/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample4.tsv.gz" -- 
#> immunarch
#>   -- [5/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//metadata.txt" -- 
#> metadata
#> 
#> == Step 2/3: checking metadata files and merging files... ==
#> Processing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/" ...
#>   -- Everything is OK!
#> 
#> == Step 3/3: processing paired chain data... ==
#> Done!
print(names(immdata))
#> [1] "data" "meta"

# We recommend creating a metadata file named "metadata.txt" in the folder.

# In that case, when you load your data you will see:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"

# If you do not have "metadata.txt", you will see the same output,
# but your metadata will be almost empty:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"

Load immune repertoire files into the R workspace

Arguments

Value

Details

See also

Examples