Diversity - estimating the heterogeneity of immune repertoires
Source:R/v1_diversity_airr.R
airr_diversity.Rd
A family of functions to quantify receptor diversity per repertoire. A characteristic of a whole repertoire.
airr_diversity_dxx
- coverage diversity: minimal number of
top receptors needed to reach perc%
of clonal space (by proportion
).
Great for spotting dominance/overexpansion and for quick, interpretable dashboards
(e.g., D50 = receptors to cover half of the repertoire).
airr_diversity_chao1
- Chao1 estimator is a nonparameteric
asymptotic estimator of species richness (number of species in a population).
One of the most used methods for estimating immune repertoire diversity.
airr_diversity_shannon
- Shannon entropy (base 2) per repertoire
computed from proportion
. Ideal when you want a single evenness-aware
diversity score; pair with Pielou/Hill for samples with very different richness.
airr_diversity_pielou
- Pielou's evenness H / log2(S)
with
richness S
. Best when you need a size-normalized evenness score that's
comparable across repertoires with different receptor counts.
airr_diversity_index
- convenience alias for Hill number with
q = 1
(exp(Shannon)
using natural log). A solid default single metric
that's relatively robust to rare-count noise and easy to compare across samples.
airr_diversity_hill
- Hill numbers ("true diversity") for
orders q \eqn{\in}{in} {0, 1, 2, ...}
: q=0
richness, q=1
exp(Shannon), q>1
emphasizes abundant receptors. Perfect when you want a diversity profile
that tunes sensitivity to rare vs. abundant clonotypes.
Usage
airr_diversity_dxx(
idata,
perc = 50,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_diversity_chao1(
idata,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_diversity_shannon(
idata,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_diversity_pielou(
idata,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_diversity_index(
idata,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_diversity_hill(
idata,
q = 0:5,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
Arguments
- idata
An
ImmunData
object.- perc
A number or numeric vector in
(0, 100]
(default50
), e.g.50
for D50,20
for D20.- autojoin
Logical. If TRUE, join repertoire metadata by the schema repertoire id. Change the default behaviour by calling
options(immunarch.autojoin = FALSE)
.- format
String. One of
"long"
("long" tibble withimd_repertoire_id
, facet columns, andvalue
; useful for visualizations) or"wide"
(wide/unmelted table of features, with each row corresponding to a specific repertoire / pair of repertoires; useful for Machine Learning).- q
A scalar or vector of non-negative orders. Defaults to
0:5
.
Value
airr_diversity_dxx
A tibble with:
imd_repertoire_id
perc
dxx
- minimal count of top receptors to reachperc%
plus repertoire metadata from
idata$repertoires
airr_diversity_chao1
A tibble with:
imd_repertoire_id
Estimator
- number of speciesSD
- standard deviation for the estimator valueConf.95.lo
- CI 0.025Conf.95.hi
- CI 0.975plus repertoire metadata from
idata$repertoires
airr_diversity_pielou
A tibble with:
imd_repertoire_id
shannon
n_receptors
pielou
- evenness in[0, 1]
(NA ifS <= 1
)
Examples
# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 1")
# Load data
# \dontrun{
immdata <- get_test_idata() |> agg_repertoires("Therapy")
#> Rows: 2 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (4): File, Therapy, Response, Prefix
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> ℹ Found 2/2 repertoire files from the metadata on the disk
#> ✔ Metadata parsed successfully
#>
#> ── Reading repertoire data
#> 1. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_0_1k.tsv
#> 2. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_1k_2k.tsv
#> ℹ Checking if all files are of the same type
#> ✔ All files have the same extension
#>
#> ── Renaming the columns and schemas
#> ✔ Renaming is finished
#>
#> ── Preprocessing the data
#> 1. exclude_columns
#> 2. filter_nonproductive
#> ✔ Preprocessing plan is ready
#>
#> ── Aggregating the data to receptors
#> ℹ No locus information found
#> ℹ Processing data as immune repertoire tables - no counts, no barcodes, no chain pairing possible
#> ✔ Execution plan for receptor data aggregation and annotation is ready
#>
#> ── Joining the metadata table with the dataset using 'filename' column
#> ✔ Joining plan is ready
#>
#> ── Postprocessing the data
#> 1. prefix_barcodes
#> ✔ Postprocessing plan is ready
#>
#> ── Saving the newly created ImmunData to disk
#> ℹ Writing the receptor annotation data to [/tmp/RtmpPHpsgz/file1eef9ddd01a/annotations.parquet]
#> ℹ Writing the metadata to [/tmp/RtmpPHpsgz/file1eef9ddd01a/metadata.json]
#> ✔ ImmunData files saved to [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and list()]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#>
#> ── Summary
#> ℹ Time elapsed: 2.39 secs
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and NULL]
#> ✔ Loaded ImmunData with [1902] chains
# }
#
# airr_diversity_dxx
#
# \dontrun{
d50 <- airr_diversity_dxx(immdata, perc = 50)
d_multi <- airr_diversity_dxx(immdata, perc = c(20, 50, 80))
# }
#
# airr_diversity_chao1
#
# \dontrun{
chao <- airr_diversity_chao1(immdata)
# }
#
# airr_diversity_shannon
#
# \dontrun{
sh <- airr_diversity_shannon(immdata)
# }
#
# airr_diversity_pielou
#
# \dontrun{
pj <- airr_diversity_pielou(immdata)
# }
#
# airr_diversity_index
#
# \dontrun{
idx <- airr_diversity_index(immdata)
# }
#
# airr_diversity_hill
#
# \dontrun{
hill <- airr_diversity_hill(immdata, q = c(0, 1, 2))
# }