This function creates germlines for clonal lineages. B cell clonal lineage represents a set of B cells that presumably have a common origin (arising from the same VDJ rearrangement event) and a common ancestor. Each clonal lineage has its own germline sequence that represents the ancestral sequence for each BCR in clonal lineage. In other words, germline sequence is a sequence of B-cells immediately after VDJ recombination, before B-cell maturation and hypermutation process. Germline sequence is useful for assessing the degree of mutation and maturity of the repertoire.

repGermline(.data, .species, .min_nuc_outside_cdr3, .threads)

Arguments

.data

The data to be processed. Can be data.frame, data.table or a list of these objects.

It must have columns in the immunarch compatible format immunarch_data_format.

.species

Species from which the data was acquired. Available options: "HomoSapiens" (default), "MusMusculus", "BosTaurus", "CamelusDromedarius", "CanisLupusFamiliaris", "DanioRerio", "MacacaMulatta", "MusMusculusDomesticus", "MusMusculusCastaneus", "MusMusculusMolossinus", "MusMusculusMusculus", "MusSpretus", "OncorhynchusMykiss", "OrnithorhynchusAnatinus", "OryctolagusCuniculus", "RattusNorvegicus", "SusScrofa".

.min_nuc_outside_cdr3

This parameter sets how many nucleotides should have V or J chain outside of CDR3 to be considered good for further alignment.

.threads

Number of threads to use.

Value

Data with added columns: * Sequence (FR1+CDR1+FR2+CDR2+FR3+CDR3+FR4 in nucleotides; the column will be replaced if exists) * V.allele, J.allele (chosen alleles of V and J genes), * V.aa, J.aa (V and J sequences from original clonotype, outside CDR3, converted to amino acids) * Germline.sequence (combined germline nucleotide sequence)

Examples


data(bcrdata)

bcrdata$data %>%
  top(5) %>%
  repGermline()
#> $full_clones
#>   J.allele    V.allele Clones   Proportion
#> 1 IGHJ4*01  IGHV1-8*01   3576 0.0003287193
#> 2 IGHJ4*01 IGHV4-39*01  14773 0.0013579896
#> 3 IGHJ4*01 IGHV4-39*01   3712 0.0003412210
#> 4 IGHJ4*01  IGHV6-1*01   3543 0.0003256859
#> 5 IGHJ6*01 IGHV4-59*01   4758 0.0004373732
#>                                                                 CDR3.nt
#> 1 TGTGCGAGACGGGCCGAAACCAATGGCTGGAACGGTTTTGGTGCCGACAAGTATTACTTTGACTTCTGG
#> 2                               TGTGCGAGAACGGATAGTGTTGGCTATTATCCGTACTTT
#> 3                         TGTGCGAGAGTAAAATTAGCCGGCCGCGGTGGTTTTGACTACTGG
#> 4                TGTGCAAGAGAGTTCCCGTATTATGTGAGCAGTGACAGTTACCTTGACTACTGG
#> 5                      TGTGCGCGAGGAGAAGACGCGTTCTTTTACTACGGTTTGGACGTCTGG
#>                   CDR3.aa                                     V.name
#> 1 CARRAETNGWNGFGADKYYFDFW                         IGHV1-8*00(867, 7)
#> 2           CARTDSVGYYPYF IGHV4-39*00(1227, 6), IGHV4-28*00(1079, 4)
#> 3         CARVKLAGRGGFDYW                       IGHV4-39*00(1105, 4)
#> 4      CAREFPYYVSSDSYLDYW                        IGHV6-1*00(1387, 6)
#> 5        CARGEDAFFYYGLDVW    IGHV4-59*00(836, 7), IGHV4-4*00(835, 7)
#>                                              D.name
#> 1  IGHD6-19*00(42), IGHD1-1*00(40), IGHD1-20*00(40)
#> 2                   IGHD3-22*00(43), IGHD3-3*00(39)
#> 3 IGHD4-23*00(35), IGHD2-15*00(30), IGHD2-21*00(30)
#> 4                                   IGHD3-16*00(45)
#> 5 IGHD3-10*00(41), IGHD3-22*00(40), IGHD4-23*00(40)
#>                               J.name V.end D.start D.end J.start VJ.ins VD.ins
#> 1                   IGHJ4*00(412, 9)   150     161   175     190     NA     NA
#> 2 IGHJ4*00(246, 9), IGHJ5*00(244, 9)   180     183   200     204     NA     NA
#> 3                      IGHJ4*00(381)   174     191   198     198     NA     NA
#> 4                   IGHJ4*00(387, 8)   166     171   180     194     NA     NA
#> 5                   IGHJ6*00(336, 1)   125     141   152     152     NA     NA
#>   DJ.ins CDR3.start CDR3.end        C.name C.start C.end
#> 1     NA        141      210 IGHD*00(0, 5)      NA    NA
#> 2     NA        171      210 IGHD*00(0, 4)      NA    NA
#> 3     NA        165      210 IGHD*00(0, 4)      NA    NA
#> 4     NA        155      209 IGHD*00(0, 5)      NA    NA
#> 5     NA        115      163 IGHD*00(0, 4)      NA    NA
#>                          CDR1.nt    CDR1.aa                     CDR2.nt
#> 1       ggatacaccttcaccagttatgat   GYTFTSYD    ATGAACCCTAACACTGGTAACTCA
#> 2 ggtggctccatcagcagtagtagttactac GGSISSSSYY       ATCTATCATAGTGGGACCACC
#> 3 ggtggctccatcagcagtagtagttactac GGSISSSSYY       ATCTATCACAGTGGGAATACC
#> 4 ggggacagtgtctctagcaacagtgctgct GDSVSSNSAA ACATACTACAGGTCCAAGTGGTATAAT
#> 5       ggtggctccatcagtagttactac   GGSISSYY       atctattacagtgggagCACC
#>     CDR2.aa
#> 1  MNPNTGNS
#> 2   IYHSGTT
#> 3   IYHSGNT
#> 4 TYYRSKWYN
#> 5   IYYSGST
#>                                                                        FR1.nt
#> 1 caggtgcagctggtgcagtctggggctgaggtgaagaagcctggggcctcagtgaaggtctcctgcaaggcttct
#> 2 cagctgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#> 3 cagctgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#> 4 caggtacagctgcagcagtcaggtccaggactggtgaagccctcgcagaccctctcactcacctgtgccatctcc
#> 5 caggtgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#>                      FR1.aa                                              FR2.nt
#> 1 QVQLVQSGAEVKKPGASVKVSCKAS atcaactgggtgcgacaggccactggacaagggcttgagtggatgGGGTGG
#> 2 QLQLQESGPGLVKPSETLSLTCTVS tggggctggatcCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGGTTGCGAGT
#> 3 QLQLQESGPGLVKPSETLSLTCTVS tggggctggatccgccagCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGC
#> 4 QVQLQQSGPGLVKPSQTLSLTCAIS tggaactggatcaggcagtccccatcgagaggccTTGAGTGGCTGGGAAGG
#> 5 QVQLQESGPGLVKPSETLSLTCTVS tggagctggatccggcagcccccagggaagggactggagtggattgggtat
#>              FR2.aa
#> 1 INWVRQATGQGLEWMGW
#> 2 WGWIRQPPGKGLEWVAS
#> 3 WGWIRQPPGKGLEWIGS
#> 4 WNWIRQSPSRGLEWLGR
#> 5 WSWIRQPPGKGLEWIGY
#>                                                                                                            FR3.nt
#> 1 GGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGTGATACTTCCATAAGCACAGCCCACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTAC
#> 2 TACTACAACCCGTCCCTCACGAGCCGAGTCACCATATCAGTAGACACGTCCAAGAATCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACTCGGCCGTGTATTAT
#> 3 TACTATAACCCGTCCCTCAAGAGTCGAGTCTCCATCTCACTTGACACGTCCAAGAACCACTTCTCCCTGGAGCTGACCTCTGTGACCGCCGCAGACACGGCCGTCTATTAC
#> 4 GATTATGCAGTGTCTGTGAAAAGTCGAATAACCGTCACCCCAGACACATCCAAGAACCAGTTCTCCCTGCATCTGAACTCTGTGACTCCCGAGGACACGGCTGTCTATTAC
#> 5 AGCTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCGGTGGACACGTCCAAGAGCCAGTTCTCCCTGAAGTTGACCTCTGTGACCGCTGCGGACACGGCCGTGTATTAC
#>                                  FR3.aa                          FR4.nt
#> 1 GYAQKFQGRVTMTSDTSISTAHMELSSLRSEDTAVYY GGGCAGGGAACCCTGGTCACCGTCTCCTCAG
#> 2 YYNPSLTSRVTISVDTSKNQFSLKLSSVTAADSAVYY GGACAGGGAACCCTGGTCACCGTCTCCTCAG
#> 3 YYNPSLKSRVSISLDTSKNHFSLELTSVTAADTAVYY GGCCTGGGAACCCTGGTCACCGTCTCCTCAG
#> 4 DYAVSVKSRITVTPDTSKNQFSLHLNSVTPEDTAVYY GGCCAGGGAGCCCTGGTCACCGTCTCCTCAG
#> 5 SYNPSLKSRVTMSVDTSKSQFSLKLTSVTAADTAVYY GGCCAAGGGATCACGGTCACCGTCTCCTCAG
#>         FR4.aa V3.Deletions J3.Deletions Clone.ID
#> 1 GQGTLV~SPSPQ           -2            3      653
#> 2 GQGTLV~SPSPQ           -2          -11      897
#> 3 GLGTLV~SPSPQ           -2           -5      901
#> 4 GQGALV~SPSPQ            0           -2      896
#> 5 GQGITV~SPSPQ           -1          -21      458
#>                                                                                                                                                                                                                                                                                                                                                                                            Sequence
#> 1 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGGTGGATGAACCCTAACACTGGTAACTCAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGTGATACTTCCATAAGCACAGCCCACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGACGGGCCGAAACCAATGGCTGGAACGGTTTTGGTGCCGACAAGTATTACTTTGACTTCTGGGGGCAGGGAACCCTGGTCACCGTCTCCTCAG
#> 2                            CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGGTTGCGAGTATCTATCATAGTGGGACCACCTACTACAACCCGTCCCTCACGAGCCGAGTCACCATATCAGTAGACACGTCCAAGAATCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACTCGGCCGTGTATTATTGTGCGAGAACGGATAGTGTTGGCTATTATCCGTACTTTGGACAGGGAACCCTGGTCACCGTCTCCTCAG
#> 3                      CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGCATCTATCACAGTGGGAATACCTACTATAACCCGTCCCTCAAGAGTCGAGTCTCCATCTCACTTGACACGTCCAAGAACCACTTCTCCCTGGAGCTGACCTCTGTGACCGCCGCAGACACGGCCGTCTATTACTGTGCGAGAGTAAAATTAGCCGGCCGCGGTGGTTTTGACTACTGGGGCCTGGGAACCCTGGTCACCGTCTCCTCAG
#> 4       CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTGTCTGTGAAAAGTCGAATAACCGTCACCCCAGACACATCCAAGAACCAGTTCTCCCTGCATCTGAACTCTGTGACTCCCGAGGACACGGCTGTCTATTACTGTGCAAGAGAGTTCCCGTATTATGTGAGCAGTGACAGTTACCTTGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCAG
#> 5                         CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAGCTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCGGTGGACACGTCCAAGAGCCAGTTCTCCCTGAAGTTGACCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGCGAGGAGAAGACGCGTTCTTTTACTACGGTTTGGACGTCTGGGGCCAAGGGATCACGGTCACCGTCTCCTCAG
#>                                                                                                 V.aa
#> 1    QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQATGQGLEWMGWMNPNTGNSGYAQKFQGRVTMTSDTSISTAHMELSSLRSEDTAVYY
#> 2   QLQLQESGPGLVKPSETLSLTCTVSGGSISSSSYYWGWIRQPPGKGLEWVASIYHSGTTYYNPSLTSRVTISVDTSKNQFSLKLSSVTAADSAVYY
#> 3   QLQLQESGPGLVKPSETLSLTCTVSGGSISSSSYYWGWIRQPPGKGLEWIGSIYHSGNTYYNPSLKSRVSISLDTSKNHFSLELTSVTAADTAVYY
#> 4 QVQLQQSGPGLVKPSQTLSLTCAISGDSVSSNSAAWNWIRQSPSRGLEWLGRTYYRSKWYNDYAVSVKSRITVTPDTSKNQFSLHLNSVTPEDTAVYY
#> 5     QVQLQESGPGLVKPSETLSLTCTVSGGSISSYYWSWIRQPPGKGLEWIGYIYYSGSTSYNPSLKSRVTMSVDTSKSQFSLKLTSVTAADTAVYY
#>           J.aa
#> 1 GQGTLV~SPSPQ
#> 2 GQGTLV~SPSPQ
#> 3 GLGTLV~SPSPQ
#> 4 GQGALV~SPSPQ
#> 5 GQGITV~SPSPQ
#>                                                                                                                                                                                                                                                                                                                                                                                   Germline.sequence
#> 1 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 2                            CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 3                      CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 4       CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 5                         CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGGCAAGGGACCACGGTCACCGTCTCCTCAG
#>