Public indices - pairwise repertoire overlap — airr

A family of functions to quantify public or shared receptors between repertoire.

Available functions

Supported methods are the following.

airr_public_intersection - number of shared receptors between each pair of repertoires (intersection size). Handy for quick overlap heatmaps, QC of replicate similarity, or spotting donor-shared "public" clonotypes.

airr_public_jaccard - Jaccard similarity of receptor sets between repertoires (\(A \cap B\) / \(A \cup B\)). Best when comparing cohorts with different sizes to get a scale-invariant overlap score.

Usage

airr_public_intersection(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

airr_public_jaccard(
  idata,
  autojoin = getOption("immundata.autojoin", TRUE),
  format = c("long", "wide")
)

Arguments

idata: An ImmunData object.
autojoin: Logical. If TRUE, join repertoire metadata by the schema repertoire id. Change the default behaviour by calling options(immunarch.autojoin = FALSE).
format: String. One of "long" ("long" tibble with imd_repertoire_id, facet columns, and value; useful for visualizations) or "wide" (wide/unmelted table of features, with each row corresponding to a specific repertoire / pair of repertoires; useful for Machine Learning).

Value

`airr_public_intersection`

A symmetric numeric matrix where rows/columns are repertoire_id and each cell is the count of shared unique receptors. The diagonal contains per-repertoire richness (total unique receptors). Row/column names are repertoire IDs.

`airr_public_jaccard`

A symmetric numeric matrix where rows/columns are repertoire_id and each cell is the Jaccard similarity in [0, 1]. The diagonal is 1. Row/column names are repertoire IDs.

Examples

# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 1")
# Load data
# \dontrun{
immdata <- get_test_idata() |> agg_repertoires("Therapy")
#> Rows: 2 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (4): File, Therapy, Response, Prefix
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> ℹ Found 2/2 repertoire files from the metadata on the disk
#> ✔ Metadata parsed successfully
#> 
#> ── Reading repertoire data 
#>   1. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_0_1k.tsv
#>   2. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_1k_2k.tsv
#> ℹ Checking if all files are of the same type
#> ✔ All files have the same extension
#> 
#> ── Renaming the columns and schemas 
#> ✔ Renaming is finished
#> 
#> ── Preprocessing the data 
#>   1. exclude_columns
#>   2. filter_nonproductive
#> ✔ Preprocessing plan is ready
#> 
#> ── Aggregating the data to receptors 
#> ℹ No locus information found
#> ℹ Processing data as immune repertoire tables - no counts, no barcodes, no chain pairing possible
#> ✔ Execution plan for receptor data aggregation and annotation is ready
#> 
#> ── Joining the metadata table with the dataset using 'filename' column 
#> ✔ Joining plan is ready
#> 
#> ── Postprocessing the data 
#>   1. prefix_barcodes
#> ✔ Postprocessing plan is ready
#> 
#> ── Saving the newly created ImmunData to disk 
#> ℹ Writing the receptor annotation data to [/tmp/RtmpPHpsgz/file1eef2022fada/annotations.parquet]
#> ℹ Writing the metadata to [/tmp/RtmpPHpsgz/file1eef2022fada/metadata.json]
#> ✔ ImmunData files saved to [/tmp/RtmpPHpsgz/file1eef2022fada]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef2022fada]
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and list()]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef2022fada]
#> 
#> ── Summary 
#> ℹ Time elapsed: 2.32 secs
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and NULL]
#> ✔ Loaded ImmunData with [1902] chains
# }

#
# airr_public_intersection
#
# \dontrun{
m_pub <- airr_public_intersection(immdata)
# }

#
# airr_public_jaccard
#
# \dontrun{
m_jac <- airr_public_jaccard(immdata)
# }

Public indices - pairwise repertoire overlap