Clonality - receptor overabundance statistics for immune repertoires
Source:R/v1_clonality_airr.R
airr_clonality.Rd
A family of functions to quantify receptor overabundance per repertoire. Helps in deciphering the structure and partition the repertoire.
airr_clonality_line
- build ranked abundance lines: for each
repertoire, take the top limit
receptors by count
and attach repertoire
metadata. Useful for per-repertoire rank-abundance plots.
airr_clonality_rank
- aggregate clonal space by rank bins.
Receptors are ordered by proportion
within each repertoire; each receptor
is assigned to the smallest threshold in bins
that contains its rank.
airr_clonality_prop
- aggregate clonal space by proportion bins.
Each receptor is assigned to a named bin according to its proportion
(e.g., Hyperexpanded >= 1e-2
, Large >= 1e-3
, ...). Thresholds are matched in
descending order; unmatched receptors fall into "Ultra-rare"
.
Usage
airr_clonality_line(
idata,
limit = 1e+05,
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_clonality_rank(
idata,
bins = c(10, 30, 100, 300, 1000, 10000, 1e+05),
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
airr_clonality_prop(
idata,
bins = c(Hyperexpanded = 0.01, Large = 0.001, Medium = 1e-04, Small = 1e-05, Rare =
1e-06),
autojoin = getOption("immundata.autojoin", TRUE),
format = c("long", "wide")
)
Arguments
- idata
An
ImmunData
object.- limit
Positive integer >= 10: maximum number of top receptors to keep per repertoire (default
100000
).- autojoin
Logical. If TRUE, join repertoire metadata by the schema repertoire id. Change the default behaviour by calling
options(immunarch.autojoin = FALSE)
.- format
String. One of
"long"
("long" tibble withimd_repertoire_id
, facet columns, andvalue
; useful for visualizations) or"wide"
(wide/unmelted table of features, with each row corresponding to a specific repertoire / pair of repertoires; useful for Machine Learning).- bins
A named numeric vector of thresholds (e.g.,
c(Hyperexpanded = 1e-2, Large = 1e-3, ...)
). Names become bin labels and must be non-empty. Internally sorted in descending order.
Value
airr_clonality_line
A tibble with columns:
repertoire_id
- repertoire identifierindex
- rank within repertoire (1 = most abundant)count
- receptor count used for rankingplus any repertoire metadata columns carried from
idata$repertoires
See also
Per-repertoire summaries: annotate_clonality
Data container: immundata::ImmunData
Examples
# Limit the number of threads used by the underlying DB for this session.
# Change this only if you know what you're doing (e.g., multi-user machines, shared CI/servers).
db_exec("SET threads TO 1")
# Load data
# \dontrun{
immdata <- get_test_idata() |> agg_repertoires("Therapy")
#> Rows: 2 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (4): File, Therapy, Response, Prefix
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> ℹ Found 2/2 repertoire files from the metadata on the disk
#> ✔ Metadata parsed successfully
#>
#> ── Reading repertoire data
#> 1. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_0_1k.tsv
#> 2. /home/runner/work/_temp/Library/immundata/extdata/tsv/sample_1k_2k.tsv
#> ℹ Checking if all files are of the same type
#> ✔ All files have the same extension
#>
#> ── Renaming the columns and schemas
#> ✔ Renaming is finished
#>
#> ── Preprocessing the data
#> 1. exclude_columns
#> 2. filter_nonproductive
#> ✔ Preprocessing plan is ready
#>
#> ── Aggregating the data to receptors
#> ℹ No locus information found
#> ℹ Processing data as immune repertoire tables - no counts, no barcodes, no chain pairing possible
#> ✔ Execution plan for receptor data aggregation and annotation is ready
#>
#> ── Joining the metadata table with the dataset using 'filename' column
#> ✔ Joining plan is ready
#>
#> ── Postprocessing the data
#> 1. prefix_barcodes
#> ✔ Postprocessing plan is ready
#>
#> ── Saving the newly created ImmunData to disk
#> ℹ Writing the receptor annotation data to [/tmp/RtmpPHpsgz/file1eef771b43cb/annotations.parquet]
#> ℹ Writing the metadata to [/tmp/RtmpPHpsgz/file1eef771b43cb/metadata.json]
#> ✔ ImmunData files saved to [/tmp/RtmpPHpsgz/file1eef771b43cb]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef771b43cb]
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and list()]
#> ℹ Reading ImmunData files from [/tmp/RtmpPHpsgz/file1eef771b43cb]
#>
#> ── Summary
#> ℹ Time elapsed: 2.35 secs
#> ✔ Loaded ImmunData with the receptor schema: [c("cdr3_aa", "v_call") and NULL]
#> ✔ Loaded ImmunData with [1902] chains
# }
#
# airr_clonality_line
#
# \dontrun{
top_line <- airr_clonality_line(immdata, limit = 1000)
# }
#
# airr_clonality_rank
#
# \dontrun{
rank_stat <- airr_clonality_rank(immdata, bins = c(10, 100))
# }
#
# airr_clonality_prop
#
# \dontrun{
prop_stat <- airr_clonality_prop(immdata)
# }