An utility function to analyse the immune receptor gene usage (IGHD, IGHJ, IDHV, IGIJ, IGKJ, IGKV, IGLJ, IGLV, TRAJ, TRAV, TRBD, etc.) and statistics. For gene details run gene_stats().

geneUsage(.data, .gene = c("hs.trbv", "HomoSapiens.TRBJ", "macmul.IGHV"),
.quant = c(NA, "count"), .ambig = c("exc", "inc", "wei", "maj"),
.type = c("segment", "allele", "family"), .norm = F)

Arguments

.data

The data to be processed. Can be data.frame, data.table, or a list of these objects.

Every object must have columns in the immunarch compatible format. immunarch_data_format

Competent users may provide advanced data representations: DBI database connections, Apache Spark DataFrame from copy_to or a list of these objects. They are supported with the same limitations as basic objects.

Note: each connection must represent a separate repertoire.

.gene

A character vector of length one with the name of the gene you want to analyse of the specific species. If you provide a vector of different length, only first element will be used. The string should also contain the species of interest, for example, valid ".gene" arguments are "hs.trbv", "HomoSapiens.TRBJ" or "macmul.IGHV". For details run gene_stats().

.quant

Select the column with data to evaluate. Pass NA if you want to compute gene statistics at the clonotype level without re-weighting. Pass "count" to use the "Clones" column to weight genes by abundance of their corresponding clonotypes.

.ambig

An option to handle ambiguous data. You can exclude data for the cases where there is no clear match for gene, include it for every supplied gene, include it with weights, or pick only first from the set. Set it to "exc", "inc", "wei", or "maj", correspondingly.

.type

Set the type of data to evaluate: "segment", "allele", or "family".

.norm

If TRUE than return proportions of genes. If FALSE then return counts of genes.