An utility function to analyse the immune receptor gene usage
(IGHD, IGHJ, IDHV, IGIJ, IGKJ, IGKV, IGLJ, IGLV, TRAJ, TRAV, TRBD, etc.)
and statistics. For gene details run gene_stats()
.
The data to be processed. Can be data.frame, data.table, or a list of these objects.
Every object must have columns in the immunarch compatible format. immunarch_data_format
Competent users may provide advanced data representations: DBI database connections, Apache Spark DataFrame from copy_to or a list of these objects. They are supported with the same limitations as basic objects.
Note: each connection must represent a separate repertoire.
A character vector of length one with the name of the gene you want
to analyse of the specific species. If you provide a vector of different length, only the first element
will be used. The string should also contain the species of interest, for example, valid ".gene" arguments
are "hs.trbv", "HomoSapiens.TRBJ" or "macmul.IGHV". For details run gene_stats()
.
Selects the column with data to evaluate. Pass NA if you want to compute gene statistics at the clonotype level without re-weighting. Pass "count" to use the "Clones" column to weight genes by abundance of their corresponding clonotypes.
An option to handle ambiguous gene assigments, e.g., "TRAV1,TRAV2".
- Pass "inc" to include all possible gene segments, so "TRAV1,TRAV2" is counted as a different gene segment.
- Pass "exc" to exclude all ambiguous gene assignments, so "TRAV1,TRAV2" is excluded from the resultant gene table.
We recommend to turn it on by passing "inc" (turned on by default). You can exclude data for the cases where there is no clear match for gene, include it for every supplied gene, or pick only first from the set. Set it to "exc", "inc" or "maj", respectively.
Set the type of data to evaluate: "segment", "allele", or "family".
If TRUE then return proportions of genes. If FALSE then return counts of genes.
A data frame with rows corresponding to gene segments and columns corresponding to the input samples.