Load clonotype databases such as VDJDB and McPAS into the R workspace
Source:R/v0_annotation.R
dbLoad.RdThe function automatically detects the database format and loads it into R. Additionally, the function provides a general query interface to databases that allows filtering by species, chain types (i.e., locus) and pathology (i.e., antigen species).
Currently we support three popular databases:
VDJDB - https://github.com/antigenomics/vdjdb-db
McPAS-TCR - https://friedmanlab.weizmann.ac.il/McPAS-TCR/
TBAdb from PIRD - https://db.cngb.org/pird/
Arguments
- .path
Character. A path to the database file, e.g., "/Users/researcher/Downloads/McPAS-TCR.csv".
- .db
Character. A database type: either "vdjdb", "vdjdb-search", "mcpas" or "tbadb".
"vdjdb" for VDJDB; "vdjdb-search" for search table obtained from the web interface of VDJDB; "mcpas" for McPAS-TCR; "tbadb" for PIRD TBAdb.
- .species
Character. A string or a vector of strings specifying which species need to be in the database, e.g., "HomoSapiens". Pass NA (by default) to load all available species.
- .chain
Character. A string or a vector of strings specifying which chains need to be in the database, e.g., "TRB". Pass NA (by default) to load all available chains.
- .pathology
Character. A string or a vector of strings specifying which disease, virus, bacteria or any condition needs to be in the database, e.g., "CMV". Pass NA (by default) to load all available conditions.
Examples
# Example file path
file_path <- paste0(system.file(package = "immunarch"), "/extdata/db/vdjdb.example.txt")
# Load the database with human-only TRB-only receptors for all known antigens
db <- dbLoad(file_path, "vdjdb", "HomoSapiens", "TRB")
db
#> # A tibble: 10 × 19
#> gene cdr3 species antigen.epitope antigen.gene antigen.species complex.id
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 TRB CASSQD… HomoSa… RLRAEAQVK EBNA3A EBV 19268
#> 2 TRB CSASIL… HomoSa… KLGGALQAK IE1 CMV 8584
#> 3 TRB CASSYF… HomoSa… KLGGALQAK IE1 CMV 3445
#> 4 TRB CASSAF… HomoSa… NLVPMVATV pp65 CMV 0
#> 5 TRB CASSLW… HomoSa… KLGGALQAK IE1 CMV 19396
#> 6 TRB CASSLT… HomoSa… NLVPMVATV pp65 CMV 0
#> 7 TRB CASTAK… HomoSa… KLGGALQAK IE1 CMV 10972
#> 8 TRB CASSGA… HomoSa… KLGGALQAK IE1 CMV 6231
#> 9 TRB CASSLI… HomoSa… KLGGALQAK IE1 CMV 12587
#> 10 TRB CATSSS… HomoSa… KLGGALQAK IE1 CMV 13267
#> # ℹ 12 more variables: v.segm <chr>, j.segm <chr>, v.end <dbl>, j.start <dbl>,
#> # mhc.a <chr>, mhc.b <chr>, mhc.class <chr>, reference.id <chr>,
#> # vdjdb.score <dbl>, Species <chr>, Chain <chr>, Pathology <chr>