This function creates germlines for clonal lineages. B cell clonal lineage represents a set of B cells that presumably have a common origin (arising from the same VDJ rearrangement event) and a common ancestor. Each clonal lineage has its own germline sequence that represents the ancestral sequence for each BCR in clonal lineage. In other words, germline sequence is a sequence of B-cells immediately after VDJ recombination, before B-cell maturation and hypermutation process. Germline sequence is useful for assessing the degree of mutation and maturity of the repertoire.
repGermline(.data, .species, .min_nuc_outside_cdr3, .threads)
The data to be processed. Can be data.frame, data.table or a list of these objects.
It must have columns in the immunarch compatible format immunarch_data_format.
Species from which the data was acquired. Available options: "HomoSapiens" (default), "MusMusculus", "BosTaurus", "CamelusDromedarius", "CanisLupusFamiliaris", "DanioRerio", "MacacaMulatta", "MusMusculusDomesticus", "MusMusculusCastaneus", "MusMusculusMolossinus", "MusMusculusMusculus", "MusSpretus", "OncorhynchusMykiss", "OrnithorhynchusAnatinus", "OryctolagusCuniculus", "RattusNorvegicus", "SusScrofa".
This parameter sets how many nucleotides should have V or J chain outside of CDR3 to be considered good for further alignment.
Number of threads to use.
Data with added columns: * Sequence (FR1+CDR1+FR2+CDR2+FR3+CDR3+FR4 in nucleotides; the column will be replaced if exists) * V.allele, J.allele (chosen alleles of V and J genes), * V.aa, J.aa (V and J sequences from original clonotype, outside CDR3, converted to amino acids) * Germline.sequence (combined germline nucleotide sequence)
data(bcrdata)
bcrdata$data %>%
top(5) %>%
repGermline()
#> $full_clones
#> J.allele V.allele Clones Proportion
#> 1 IGHJ4*01 IGHV1-8*01 3576 0.0003287193
#> 2 IGHJ4*01 IGHV4-39*01 14773 0.0013579896
#> 3 IGHJ4*01 IGHV4-39*01 3712 0.0003412210
#> 4 IGHJ4*01 IGHV6-1*01 3543 0.0003256859
#> 5 IGHJ6*01 IGHV4-59*01 4758 0.0004373732
#> CDR3.nt
#> 1 TGTGCGAGACGGGCCGAAACCAATGGCTGGAACGGTTTTGGTGCCGACAAGTATTACTTTGACTTCTGG
#> 2 TGTGCGAGAACGGATAGTGTTGGCTATTATCCGTACTTT
#> 3 TGTGCGAGAGTAAAATTAGCCGGCCGCGGTGGTTTTGACTACTGG
#> 4 TGTGCAAGAGAGTTCCCGTATTATGTGAGCAGTGACAGTTACCTTGACTACTGG
#> 5 TGTGCGCGAGGAGAAGACGCGTTCTTTTACTACGGTTTGGACGTCTGG
#> CDR3.aa V.name
#> 1 CARRAETNGWNGFGADKYYFDFW IGHV1-8*00(867, 7)
#> 2 CARTDSVGYYPYF IGHV4-39*00(1227, 6), IGHV4-28*00(1079, 4)
#> 3 CARVKLAGRGGFDYW IGHV4-39*00(1105, 4)
#> 4 CAREFPYYVSSDSYLDYW IGHV6-1*00(1387, 6)
#> 5 CARGEDAFFYYGLDVW IGHV4-59*00(836, 7), IGHV4-4*00(835, 7)
#> D.name
#> 1 IGHD6-19*00(42), IGHD1-1*00(40), IGHD1-20*00(40)
#> 2 IGHD3-22*00(43), IGHD3-3*00(39)
#> 3 IGHD4-23*00(35), IGHD2-15*00(30), IGHD2-21*00(30)
#> 4 IGHD3-16*00(45)
#> 5 IGHD3-10*00(41), IGHD3-22*00(40), IGHD4-23*00(40)
#> J.name V.end D.start D.end J.start VJ.ins VD.ins
#> 1 IGHJ4*00(412, 9) 150 161 175 190 NA NA
#> 2 IGHJ4*00(246, 9), IGHJ5*00(244, 9) 180 183 200 204 NA NA
#> 3 IGHJ4*00(381) 174 191 198 198 NA NA
#> 4 IGHJ4*00(387, 8) 166 171 180 194 NA NA
#> 5 IGHJ6*00(336, 1) 125 141 152 152 NA NA
#> DJ.ins CDR3.start CDR3.end C.name C.start C.end
#> 1 NA 141 210 IGHD*00(0, 5) NA NA
#> 2 NA 171 210 IGHD*00(0, 4) NA NA
#> 3 NA 165 210 IGHD*00(0, 4) NA NA
#> 4 NA 155 209 IGHD*00(0, 5) NA NA
#> 5 NA 115 163 IGHD*00(0, 4) NA NA
#> CDR1.nt CDR1.aa CDR2.nt
#> 1 ggatacaccttcaccagttatgat GYTFTSYD ATGAACCCTAACACTGGTAACTCA
#> 2 ggtggctccatcagcagtagtagttactac GGSISSSSYY ATCTATCATAGTGGGACCACC
#> 3 ggtggctccatcagcagtagtagttactac GGSISSSSYY ATCTATCACAGTGGGAATACC
#> 4 ggggacagtgtctctagcaacagtgctgct GDSVSSNSAA ACATACTACAGGTCCAAGTGGTATAAT
#> 5 ggtggctccatcagtagttactac GGSISSYY atctattacagtgggagCACC
#> CDR2.aa
#> 1 MNPNTGNS
#> 2 IYHSGTT
#> 3 IYHSGNT
#> 4 TYYRSKWYN
#> 5 IYYSGST
#> FR1.nt
#> 1 caggtgcagctggtgcagtctggggctgaggtgaagaagcctggggcctcagtgaaggtctcctgcaaggcttct
#> 2 cagctgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#> 3 cagctgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#> 4 caggtacagctgcagcagtcaggtccaggactggtgaagccctcgcagaccctctcactcacctgtgccatctcc
#> 5 caggtgcagctgcaggagtcgggcccaggactggtgaagccttcggagaccctgtccctcacctgcactgtctct
#> FR1.aa FR2.nt
#> 1 QVQLVQSGAEVKKPGASVKVSCKAS atcaactgggtgcgacaggccactggacaagggcttgagtggatgGGGTGG
#> 2 QLQLQESGPGLVKPSETLSLTCTVS tggggctggatcCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGGTTGCGAGT
#> 3 QLQLQESGPGLVKPSETLSLTCTVS tggggctggatccgccagCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGC
#> 4 QVQLQQSGPGLVKPSQTLSLTCAIS tggaactggatcaggcagtccccatcgagaggccTTGAGTGGCTGGGAAGG
#> 5 QVQLQESGPGLVKPSETLSLTCTVS tggagctggatccggcagcccccagggaagggactggagtggattgggtat
#> FR2.aa
#> 1 INWVRQATGQGLEWMGW
#> 2 WGWIRQPPGKGLEWVAS
#> 3 WGWIRQPPGKGLEWIGS
#> 4 WNWIRQSPSRGLEWLGR
#> 5 WSWIRQPPGKGLEWIGY
#> FR3.nt
#> 1 GGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGTGATACTTCCATAAGCACAGCCCACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTAC
#> 2 TACTACAACCCGTCCCTCACGAGCCGAGTCACCATATCAGTAGACACGTCCAAGAATCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACTCGGCCGTGTATTAT
#> 3 TACTATAACCCGTCCCTCAAGAGTCGAGTCTCCATCTCACTTGACACGTCCAAGAACCACTTCTCCCTGGAGCTGACCTCTGTGACCGCCGCAGACACGGCCGTCTATTAC
#> 4 GATTATGCAGTGTCTGTGAAAAGTCGAATAACCGTCACCCCAGACACATCCAAGAACCAGTTCTCCCTGCATCTGAACTCTGTGACTCCCGAGGACACGGCTGTCTATTAC
#> 5 AGCTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCGGTGGACACGTCCAAGAGCCAGTTCTCCCTGAAGTTGACCTCTGTGACCGCTGCGGACACGGCCGTGTATTAC
#> FR3.aa FR4.nt
#> 1 GYAQKFQGRVTMTSDTSISTAHMELSSLRSEDTAVYY GGGCAGGGAACCCTGGTCACCGTCTCCTCAG
#> 2 YYNPSLTSRVTISVDTSKNQFSLKLSSVTAADSAVYY GGACAGGGAACCCTGGTCACCGTCTCCTCAG
#> 3 YYNPSLKSRVSISLDTSKNHFSLELTSVTAADTAVYY GGCCTGGGAACCCTGGTCACCGTCTCCTCAG
#> 4 DYAVSVKSRITVTPDTSKNQFSLHLNSVTPEDTAVYY GGCCAGGGAGCCCTGGTCACCGTCTCCTCAG
#> 5 SYNPSLKSRVTMSVDTSKSQFSLKLTSVTAADTAVYY GGCCAAGGGATCACGGTCACCGTCTCCTCAG
#> FR4.aa V3.Deletions J3.Deletions Clone.ID
#> 1 GQGTLV~SPSPQ -2 3 653
#> 2 GQGTLV~SPSPQ -2 -11 897
#> 3 GLGTLV~SPSPQ -2 -5 901
#> 4 GQGALV~SPSPQ 0 -2 896
#> 5 GQGITV~SPSPQ -1 -21 458
#> Sequence
#> 1 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGGTGGATGAACCCTAACACTGGTAACTCAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGTGATACTTCCATAAGCACAGCCCACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGACGGGCCGAAACCAATGGCTGGAACGGTTTTGGTGCCGACAAGTATTACTTTGACTTCTGGGGGCAGGGAACCCTGGTCACCGTCTCCTCAG
#> 2 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGGTTGCGAGTATCTATCATAGTGGGACCACCTACTACAACCCGTCCCTCACGAGCCGAGTCACCATATCAGTAGACACGTCCAAGAATCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACTCGGCCGTGTATTATTGTGCGAGAACGGATAGTGTTGGCTATTATCCGTACTTTGGACAGGGAACCCTGGTCACCGTCTCCTCAG
#> 3 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGCATCTATCACAGTGGGAATACCTACTATAACCCGTCCCTCAAGAGTCGAGTCTCCATCTCACTTGACACGTCCAAGAACCACTTCTCCCTGGAGCTGACCTCTGTGACCGCCGCAGACACGGCCGTCTATTACTGTGCGAGAGTAAAATTAGCCGGCCGCGGTGGTTTTGACTACTGGGGCCTGGGAACCCTGGTCACCGTCTCCTCAG
#> 4 CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTGTCTGTGAAAAGTCGAATAACCGTCACCCCAGACACATCCAAGAACCAGTTCTCCCTGCATCTGAACTCTGTGACTCCCGAGGACACGGCTGTCTATTACTGTGCAAGAGAGTTCCCGTATTATGTGAGCAGTGACAGTTACCTTGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCAG
#> 5 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAGCTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCGGTGGACACGTCCAAGAGCCAGTTCTCCCTGAAGTTGACCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGCGAGGAGAAGACGCGTTCTTTTACTACGGTTTGGACGTCTGGGGCCAAGGGATCACGGTCACCGTCTCCTCAG
#> V.aa
#> 1 QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQATGQGLEWMGWMNPNTGNSGYAQKFQGRVTMTSDTSISTAHMELSSLRSEDTAVYY
#> 2 QLQLQESGPGLVKPSETLSLTCTVSGGSISSSSYYWGWIRQPPGKGLEWVASIYHSGTTYYNPSLTSRVTISVDTSKNQFSLKLSSVTAADSAVYY
#> 3 QLQLQESGPGLVKPSETLSLTCTVSGGSISSSSYYWGWIRQPPGKGLEWIGSIYHSGNTYYNPSLKSRVSISLDTSKNHFSLELTSVTAADTAVYY
#> 4 QVQLQQSGPGLVKPSQTLSLTCAISGDSVSSNSAAWNWIRQSPSRGLEWLGRTYYRSKWYNDYAVSVKSRITVTPDTSKNQFSLHLNSVTPEDTAVYY
#> 5 QVQLQESGPGLVKPSETLSLTCTVSGGSISSYYWSWIRQPPGKGLEWIGYIYYSGSTSYNPSLKSRVTMSVDTSKSQFSLKLTSVTAADTAVYY
#> J.aa
#> 1 GQGTLV~SPSPQ
#> 2 GQGTLV~SPSPQ
#> 3 GLGTLV~SPSPQ
#> 4 GQGALV~SPSPQ
#> 5 GQGITV~SPSPQ
#> Germline.sequence
#> 1 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 2 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 3 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 4 CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCCAAGGAACCCTGGTCACCGTCTCCTCAG
#> 5 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGGCAAGGGACCACGGTCACCGTCTCCTCAG
#>