Aligns all sequences incliding germline within each clonal lineage within each cluster
Source:R/v0_align_lineage.R
repAlignLineage.RdThis function aligns all sequences (incliding germline) that belong to one clonal lineage and one cluster. After clustering and building the clonal lineage and germline, the next step is to analyze the degree of mutation and maturity of each clonal lineage. This allows for finding high mature cells and cells with a large number of offspring. The phylogenetic analysis will find mutations that increase the affinity of BCR. Making alignment of the sequence is the first step towards sequence analysis including BCR.
Arguments
- .data
The data to be processed. Can be data.frame, data.table::data.table or a list of these objects.
- .min_lineage_sequences
If number of sequences in the same clonal lineage and the same cluster (not including germline) is lower than this threshold, this group of sequences will be filtered out from the dataframe; so only large enough lineages will be included.
- .prepare_threads
Number of threads to prepare results table. Please note that high number can cause heavy memory usage!
- .align_threads
Number of threads for lineage alignment.
It must have columns in the immunarch compatible format immunarch_data_format, and also must contain 'Cluster' column, which is added by seqCluster() function, and 'Germline.sequence' column, which is added by repGermline() function.
- .nofail
Will return NA instead of stopping if Clustal W is not installed. Used to avoid raising errors in examples on computers where Clustal W is not installed.
Value
Dataframe or list of dataframes (if input is a list with multiple samples). The dataframe has these columns:
Cluster: cluster name
Germline: germline sequence
Alignment: DNAbin object with alignment
Sequences: nested dataframe containing all sequences for this combination of cluster and germline; it has columns
Sequence, CDR1.nt, CDR2.nt, CDR3.nt, FR1.nt, FR2.nt, FR3.nt, FR4.nt, V.allele, J.allele, V.aa, J.aa: all values taken from the input dataframe
Clone.ID: taken from the input dataframe, or created (filled with row numbers) if missing
Clones: taken from the input dataframe, or created (filled with '1' values) if missing
Examples
data(bcrdata)
bcr_data <- bcrdata$data
bcr_data %>%
seqCluster(seqDist(bcr_data), .fixed_threshold = 3) %>%
repGermline(.threads = 1) %>%
repAlignLineage(.min_lineage_sequences = 2, .align_threads = 2, .nofail = TRUE)
#> repAlignLineage requires Clustal W app to be installed!
#> Please download it from here: http://www.clustal.org/download/current/
#> or install it with your system package manager (such as apt or dnf).
#> [1] NA
#> attr(,"class")
#> [1] "step_failure_ignored" "logical"