Skip to contents

[Experimental]

A family of functions to quantify receptor diversity per repertoire. A characteristic of a whole repertoire.

Available functions

Supported methods are the following.

airr_diversity_dxx - coverage diversity: minimal number of top receptors needed to reach perc% of clonal space (by proportion). Great for spotting dominance/overexpansion and for quick, interpretable dashboards (e.g., D50 = receptors to cover half of the repertoire).

airr_diversity_chao1 - Chao1 estimator is a nonparameteric asymptotic estimator of species richness (number of species in a population). One of the most used methods for estimating immune repertoire diversity.

airr_diversity_shannon - Shannon entropy (base 2) per repertoire computed from proportion. Ideal when you want a single evenness-aware diversity score; pair with Pielou/Hill for samples with very different richness.

airr_diversity_pielou - Pielou's evenness H / log2(S) with richness S. Best when you need a size-normalized evenness score that's comparable across repertoires with different receptor counts.

airr_diversity_index - convenience alias for Hill number with q = 1 (exp(Shannon) using natural log). A solid default single metric that's relatively robust to rare-count noise and easy to compare across samples.

airr_diversity_hill - Hill numbers ("true diversity") for orders q \eqn{\in}{in} {0, 1, 2, ...}: q=0 richness, q=1 exp(Shannon), q>1 emphasizes abundant receptors. Perfect when you want a diversity profile that tunes sensitivity to rare vs. abundant clonotypes.

Usage

airr_diversity_dxx(
  idata,
  perc = 50,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_diversity_chao1(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_diversity_shannon(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_diversity_pielou(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_diversity_index(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_diversity_hill(
  idata,
  q = 0:5,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

Arguments

idata

An ImmunData object.

perc

A number or numeric vector in (0, 100] (default 50), e.g. 50 for D50, 20 for D20.

autojoin

Logical. If TRUE, join repertoire metadata by the schema repertoire id. Change the default behaviour by calling options(immunarch.autojoin = FALSE).

format

String. One of "long" ("long" tibble with imd_repertoire_id, facet columns, and value; useful for visualizations) or "wide" (wide/unmelted table of features, with each row corresponding to a specific repertoire / pair of repertoires; useful for Machine Learning).

q

A scalar or vector of non-negative orders. Defaults to 0:5.

Value

airr_diversity_dxx

A tibble with:

  • imd_repertoire_id

  • perc

  • dxx - minimal count of top receptors to reach perc%

  • plus repertoire metadata from idata$repertoires

airr_diversity_chao1

A tibble with:

  • imd_repertoire_id

  • Estimator - number of species

  • SD - standard deviation for the estimator value

  • Conf.95.lo - CI 0.025

  • Conf.95.hi - CI 0.975

  • plus repertoire metadata from idata$repertoires

airr_diversity_shannon

A tibble with:

  • imd_repertoire_id

  • shannon - entropy in bits

airr_diversity_pielou

A tibble with:

  • imd_repertoire_id

  • shannon

  • n_receptors

  • pielou - evenness in [0, 1] (NA if S <= 1)

airr_diversity_index

A tibble with:

  • imd_repertoire_id

  • q = 1

  • hill_number

  • plus repertoire metadata from idata$repertoires

airr_diversity_hill

A tibble with:

  • imd_repertoire_id

  • q - Hill order

  • hill_number - true diversity of order q

  • plus repertoire metadata from idata$repertoires

Examples

# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 1")
# Load data
# \dontrun{
immdata <- get_test_idata() |> agg_repertoires("Therapy")
#> Rows: 2 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (4): File, Therapy, Response, Prefix
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#>  Found 2/2 repertoire files from the metadata on the disk
#>  Metadata parsed successfully
#> 
#> ── Reading repertoire data 
#>   1. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_0_1k.tsv
#>   2. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_1k_2k.tsv
#>  Checking if all files are of the same type
#>  All files have the same extension
#> 
#> ── Renaming the columns and schemas 
#>  Renaming is finished
#> 
#> ── Preprocessing the data 
#>   1. exclude_columns
#>   2. filter_nonproductive
#>  Preprocessing plan is ready
#> 
#> ── Aggregating the data to receptors 
#>  No locus information found
#>  Processing data as immune repertoire tables - no counts, no barcodes, no chain pairing possible
#>  Execution plan for receptor data aggregation and annotation is ready
#> 
#> ── Joining the metadata table with the dataset using 'filename' column 
#>  Joining plan is ready
#> 
#> ── Postprocessing the data 
#>   1. prefix_barcodes
#>  Postprocessing plan is ready
#> 
#> ── Saving the newly created ImmunData to disk 
#>  Writing the receptor annotation data to [/tmp/RtmpPHpsgz/file1eef9ddd01a/annotations.parquet]
#>  Writing the metadata to [/tmp/RtmpPHpsgz/file1eef9ddd01a/metadata.json]
#>  ImmunData files saved to [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#>  Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#>  Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and list()]
#>  Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef9ddd01a]
#> 
#> ── Summary 
#>  Time elapsed: 2.39 secs
#>  Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and NULL]
#>  Loaded ImmunData with [1902] chains
# }

#
# airr_diversity_dxx
#
# \dontrun{
d50 <- airr_diversity_dxx(immdata, perc = 50)
d_multi <- airr_diversity_dxx(immdata, perc = c(20, 50, 80))
# }

#
# airr_diversity_chao1
#
# \dontrun{
chao <- airr_diversity_chao1(immdata)
# }

#
# airr_diversity_shannon
#
# \dontrun{
sh <- airr_diversity_shannon(immdata)
# }

#
# airr_diversity_pielou
#
# \dontrun{
pj <- airr_diversity_pielou(immdata)
# }

#
# airr_diversity_index
#
# \dontrun{
idx <- airr_diversity_index(immdata)
# }

#
# airr_diversity_hill
#
# \dontrun{
hill <- airr_diversity_hill(immdata, q = c(0, 1, 2))
# }