Create a repertoire of public clonotypes

pubRep(
  .data,
  .col = "aa+v",
  .quant = c("count", "prop"),
  .coding = TRUE,
  .min.samples = 1,
  .max.samples = NA,
  .verbose = TRUE
)

Arguments

.data

The data to be processed. Can be data.frame, data.table, or a list of these objects.

Every object must have columns in the immunarch compatible format. immunarch_data_format

Competent users may provide advanced data representations: DBI database connections, Apache Spark DataFrame from copy_to or a list of these objects. They are supported with the same limitations as basic objects.

Note: each connection must represent a separate repertoire.

.col

A string that specifies the column(s) to be processed. Outputs one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" to compute overlaps on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment.

.quant

A string that specifies the column to be processed. Set "count" to see public clonotype sharing with the number of clones, set "prop" to see proportions.

.coding

Logical. If TRUE then preprocesses the data to filter out non-coding sequences.

.min.samples

Integer. A minimal number of samples a clonotype must have to be included in the public repertoire table.

.max.samples

Integer. A maxminal number of samples a clonotype must have to be included in the public repertoire table. Set NA (by default) to have the maximal amount of samples.

.verbose

Logical. If TRUE then outputs the progress.

Value

Data table with columns for:

- Clonotypes (e.g., CDR3 sequence, or two columns for CDR3 sequence and V gene)

- Incidence of clonotypes

- Per-sample proportions or counts

Examples

# Subset the data to make the example faster to run
immdata$data <- lapply(immdata$data, head, 2000)
pr <- pubRep(immdata$data, .verbose = FALSE)
vis(pr, "clonotypes", 1, 2)
#> `geom_smooth()` using formula = 'y ~ x'