Create a repertoire of public clonotypes

pubRep(
.data,
.col = "aa+v",
.quant = c("count", "prop"),
.coding = TRUE,
.min.samples = 1,
.max.samples = NA,
.verbose = TRUE
)

## Arguments

.data The data to be processed. Can be data.frame, data.table, or a list of these objects. Every object must have columns in the immunarch compatible format. immunarch_data_format Competent users may provide advanced data representations: DBI database connections, Apache Spark DataFrame from copy_to or a list of these objects. They are supported with the same limitations as basic objects. Note: each connection must represent a separate repertoire. A string that specifies the column(s) to be processed. Pass one of the following strings, separated by the plus sign: "nt" for nucleotide sequences, "aa" for amino acid sequences, "v" for V gene segments, "j" for J gene segments. E.g., pass "aa+v" to compute overlaps on CDR3 amino acid sequences paired with V gene segments, i.e., in this case a unique clonotype is a pair of CDR3 amino acid and V gene segment. A string that specifies the column to be processed. Pass "count" to see public clonotype sharing with the number of clones, pass "prop" to see proportions. Logical. If TRUE then preprocess the data to filter out non-coding sequences. Integer. A minimal number of samples a clonotype must have to be included in the public repertoire table. Integer. A maxminal number of samples a clonotype must have to be included in the public repertoire table. Pass NA (by default) to have the maximal amount of samples. Logical. If TRUE then output the progress.

## Value

Data table with columns for:

- Clonotypes (e.g., CDR3 sequence, or two columns for CDR3 sequence and V gene)

- Incidence of clonotypes

- Per-sample proportions or counts

## Examples

# Subset the data to make the example faster to run
immdata$data <- lapply(immdata$data, head, 2000)
pr <- pubRep(immdata\$data, .verbose = FALSE)
vis(pr, "clonotypes", 1, 2)
#> geom_smooth() using formula 'y ~ x'