The repLoad
function loads repertoire files
into R workspace in the immunarch format where you can immediately use them for
the analysis. repLoad
automatically detects the right format for
your files, so all you need is simply provide the path to your files.
See "Details" for more information on supported formats. See "Examples" for diving right into it.
repLoad(.path, .mode = "paired", .coding = TRUE, ...)
A character string specifying the path to the input data. Input data can be one of the following:
- a single repertoire file.
In this case repLoad
returns an R data.frame;
- a vector of paths to repertoire files. Same as in the case with no metadata file presented in the next section below;
- a path to the folder with repertoire files and, if available, metadata file "metadata.txt".
If the metadata file if presented, then the repLoad
returns a list with two elements "data" and "meta".
"data" is an another list with repertoire R data.frames. "meta" is a data frame with the metadata.
If the metadata file "metadata.txt" is not presented, then the repLoad
creates a dummy metadata file with
sample names and returns a list with two elements "data" and "meta".
If input data has multiple chains or cell types stored in the same file
(for example, like in 10xGenomics repertoire files), such repertoire files will be splitted to different
R data frames with only one type of chain and cell presented. The metadata file will have additional columns specifying
cell and chain types for different samples.
Either "single" for single chain data or "paired" for paired chain data.
Currently "single" works for every format, and "paired" works only for 10X Genomics data.
By default, 10X Genomics data will be loaded as paired chain data, and other files will be loaded as single chain data.
A logical value. Set TRUE to get coding-only clonotypes (by defaul). Set FALSE to get all clonotypes.
Extra arguments for parsing functions
A list with two named elements:
- "data" is a list of input samples;
- "meta" is a data frame with sample metadata.
The metadata has to be a tab delimited file with first column named "Sample". It can have any number of additional columns with arbitrary names. The first column should contain base names of files without extensions in your folder. Example:
Sample | Sex | Age | Status |
immunoseq_1 | M | 1 | C |
immunoseq_2 | M | 2 | C |
immunoseq_3 | FALSE | 3 | A |
Currently, Immunarch support the following formats:
- "immunoseq" - ImmunoSEQ of any version. http://www.adaptivebiotech.com/immunoseq
- "mitcr" - MiTCR. https://github.com/milaboratory/mitcr
- "mixcr" - MiXCR (the "all" files) of any version. https://github.com/milaboratory/mixcr
- "migec" - MiGEC. http://migec.readthedocs.io/en/latest/
- "migmap" - For parsing IgBLAST results postprocessed with MigMap. https://github.com/mikessh/migmap
- "tcr" - tcR, our previous package. https://imminfo.github.io/tcr/
- "vdjtools" - VDJtools of any version. http://vdjtools-doc.readthedocs.io/en/latest/
- "imgt" - IMGT HighV-QUEST. http://www.imgt.org/HighV-QUEST/
- "airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html
- "10x" - 10XGenomics clonotype annotations tables. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/output/annotation
- "archer" - ArcherDX clonotype tables. https://archerdx.com/
immunr_data_format for immunarch data format; repSave for file saving; repOverlap, geneUsage and repDiversity for starting with immune repertoires basic statistics.
# To load the data from a single file (note that you don't need to specify the data format):
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/Sample1.tsv.gz")
immdata <- repLoad(file_path)
#>
#> == Step 1/3: loading repertoire files... ==
#> Processing "<initial>" ...
#> -- [1/1] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/Sample1.tsv.gz" --
#> immunarch
#>
#> == Step 2/3: checking metadata files and merging files... ==
#> Processing "<initial>" ...
#> -- Metadata file not found; creating a dummy metadata...
#>
#> == Step 3/3: processing paired chain data... ==
#> Done!
# Suppose you have a following structure in your folder:
# >_ ls
# immunoseq1.txt
# immunoseq2.txt
# immunoseq3.txt
# metadata.txt
# To load the whole folder with every file in it type:
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/")
immdata <- repLoad(file_path)
#>
#> == Step 1/3: loading repertoire files... ==
#> Processing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/" ...
#> -- [1/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample1.tsv.gz" --
#> immunarch
#> -- [2/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample2.tsv.gz" --
#> immunarch
#> -- [3/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample3.tsv.gz" --
#> immunarch
#> -- [4/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//Sample4.tsv.gz" --
#> immunarch
#> -- [5/5] Parsing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io//metadata.txt" --
#> metadata
#>
#> == Step 2/3: checking metadata files and merging files... ==
#> Processing "/tmp/RtmpYauDlO/temp_libpath125bc1594e6916/immunarch/extdata/io/" ...
#> -- Everything is OK!
#>
#> == Step 3/3: processing paired chain data... ==
#> Done!
print(names(immdata))
#> [1] "data" "meta"
# We recommend creating a metadata file named "metadata.txt" in the folder.
# In that case, when you load your data you will see:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"
# If you do not have "metadata.txt", you will see the same output,
# but your metadata will be almost empty:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"