Aligns all sequences incliding germline within each clonal lineage within each cluster

This function aligns all sequences (incliding germline) that belong to one clonal lineage and one cluster. After clustering and building the clonal lineage and germline, the next step is to analyze the degree of mutation and maturity of each clonal lineage. This allows for finding high mature cells and cells with a large number of offspring. The phylogenetic analysis will find mutations that increase the affinity of BCR. Making alignment of the sequence is the first step towards sequence analysis including BCR.

Usage

repAlignLineage(.data, .min_lineage_sequences, .prepare_threads, .align_threads, .nofail)

Arguments

.data

The data to be processed. Can be data.frame, data.table::data.table or a list of these objects.

.min_lineage_sequences

If number of sequences in the same clonal lineage and the same cluster (not including germline) is lower than this threshold, this group of sequences will be filtered out from the dataframe; so only large enough lineages will be included.

.prepare_threads

Number of threads to prepare results table. Please note that high number can cause heavy memory usage!

.align_threads

Number of threads for lineage alignment.

It must have columns in the immunarch compatible format immunarch_data_format, and also must contain 'Cluster' column, which is added by seqCluster() function, and 'Germline.sequence' column, which is added by repGermline() function.

.nofail

Will return NA instead of stopping if Clustal W is not installed. Used to avoid raising errors in examples on computers where Clustal W is not installed.

Value

Dataframe or list of dataframes (if input is a list with multiple samples). The dataframe has these columns:

Cluster: cluster name
Germline: germline sequence
Alignment: DNAbin object with alignment
Sequences: nested dataframe containing all sequences for this combination of cluster and germline; it has columns
- Sequence, CDR1.nt, CDR2.nt, CDR3.nt, FR1.nt, FR2.nt, FR3.nt, FR4.nt, V.allele, J.allele, V.aa, J.aa: all values taken from the input dataframe
- Clone.ID: taken from the input dataframe, or created (filled with row numbers) if missing
- Clones: taken from the input dataframe, or created (filled with '1' values) if missing

Examples


data(bcrdata)
bcr_data <- bcrdata$data

bcr_data %>%
  seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>%
  repGermline(.threads = 1) %>%
  repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE)
#> repAlignLineage requires Clustal W app to be installed!
#> Please download it from here: http://www.clustal.org/download/current/
#> or install it with your system package manager (such as apt or dnf).
#> [1] NA
#> attr(,"class")
#> [1] "step_failure_ignored" "logical"