Tracks the temporal dynamics of clonotypes in repertoires. For example, tracking across multiple time points after vaccination.
Note: duplicated clonotypes are merged and their counts are summed up.
Usage
trackClonotypes(.data, .which = list(1, 15), .col = "aa", .norm = TRUE)Arguments
- .data
The data to process. It can be a data.frame, a data.table::data.table, or a list of these objects.
Every object must have columns in the immunarch compatible format. immunarch_data_format
Competent users may provide advanced data representations: DBI database connections, or a list of these objects. They are supported with the same limitations as basic objects.
Note: each connection must represent a separate repertoire.
- .which
An argument that regulates which clonotypes to choose for tracking. There are three options for this argument:
passes a list with two elements
list(X, Y), whereXis the name or the index of a target repertoire from ".data", andYis the number of the most abundant clonotypes to take fromX.passes a character vector of sequences to take from all data frames;
passes a data frame (data table, database) with one or more columns - first for sequences, and other for gene segments (if applicable).
See the "Examples" below with examples for each option.
- .col
A character vector of length 1. Specifies an identifier for a column, from which the function chooses clonotype sequences. Specify "nt" for nucleotide sequences, "aa" for amino acid sequences, "aa+v" for amino acid sequences and Variable genes, "nt+j" for nucleotide sequences with Joining genes, or any combination of the above. Used only if ".which" has option 1) or option 2).
- .norm
Logical. If TRUE then uses Proportion instead of the number of Clones per clonotype to store in the function output.
Examples
# Load an example data that comes with immunarch
data(immdata)
# Make the data smaller in order to speed up the examples
immdata$data <- immdata$data[c(1, 2, 3, 7, 8, 9)]
immdata$meta <- immdata$meta[c(1, 2, 3, 7, 8, 9), ]
# Option 1
# Choose the first 10 amino acid clonotype sequences
# from the first repertoire to track
tc <- trackClonotypes(immdata$data, list(1, 10), .col = "aa")
# Choose the first 20 nucleotide clonotype sequences
# and their V genes from the "MS1" repertoire to track
tc <- trackClonotypes(immdata$data, list("MS1", 20), .col = "nt+v")
# Option 2
# Choose clonotypes with amino acid sequences "CASRGLITDTQYF" or "CSASRGSPNEQYF"
tc <- trackClonotypes(immdata$data, c("CASRGLITDTQYF", "CSASRGSPNEQYF"), .col = "aa")
# Option 3
# Choose the first 10 clonotypes from the first repertoire
# with amino acid sequences and V segments
target <- immdata$data[[1]] %>%
select(CDR3.aa, V.name) %>%
head(10)
tc <- trackClonotypes(immdata$data, target)
# Visualise the output regardless of the chosen option
# Therea are three way to visualise it, regulated by the .plot argument
vis(tc, .plot = "smooth")
#> Warning: id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are considered id.vars, which in this case are columns [CDR3.aa, V.name, ...]. Consider providing at least one of 'id' or 'measure' vars in future.
vis(tc, .plot = "area")
#> Warning: id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are considered id.vars, which in this case are columns [CDR3.aa, V.name, ...]. Consider providing at least one of 'id' or 'measure' vars in future.
vis(tc, .plot = "line")
#> Warning: id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are considered id.vars, which in this case are columns [CDR3.aa, V.name, ...]. Consider providing at least one of 'id' or 'measure' vars in future.
# Visualising timepoints
# First, we create an additional column in the metadata with randomly choosen timepoints:
immdata$meta$Timepoint <- sample(1:length(immdata$data))
immdata$meta
#> # A tibble: 6 × 7
#> Sample ID Sex Age Status Lane Timepoint
#> <chr> <chr> <chr> <dbl> <chr> <chr> <int>
#> 1 A2-i129 C1 M 11 C A 1
#> 2 A2-i131 C2 M 9 C A 3
#> 3 A2-i133 C4 M 16 C A 6
#> 4 MS1 MS1 M 12 MS C 2
#> 5 MS2 MS2 M 30 MS C 5
#> 6 MS3 MS3 M 8 MS C 4
# Next, we create a vector with samples in the right order,
# according to the "Timepoint" column (from smallest to greatest):
sample_order <- order(immdata$meta$Timepoint)
# Sanity check: timepoints are following the right order:
immdata$meta$Timepoint[sample_order]
#> [1] 1 2 3 4 5 6
# Samples, sorted by the timepoints:
immdata$meta$Sample[sample_order]
#> [1] "A2-i129" "MS1" "A2-i131" "MS3" "MS2" "A2-i133"
# And finally, we visualise the data:
vis(tc, .order = sample_order)
#> Warning: id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are considered id.vars, which in this case are columns [CDR3.aa, V.name, ...]. Consider providing at least one of 'id' or 'measure' vars in future.