向 NCBI 提交基因簇的时候需要提供 sqn 格式的文件,这个文件需要通过 tbl2asn 生成。
文件准备
tbl2asn 依赖三个文件来生成 sqn 文件:
- 文件 1:fasta 格式的基因组序列文件
Header 处的中括号部分可以不写。
>Toyoncin_biosynthesis_gene_cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence
ttaaaa taatttaata
gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca
tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca
ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta
accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt
aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca
cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac
aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc
tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac
atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca
gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc
aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta
caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga
ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat
cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact
gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct
ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat
gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat
agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta
atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa
ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac
atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc
aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact
gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa
ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc
actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc
aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga
catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc
aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag
aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc
aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt
cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt
cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata
tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc
tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc
tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag
ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat
aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt
aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc
tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac
ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat
ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt
catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt
attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac
aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag
tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata
caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct
gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta
aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc
ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca
cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct
tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca
acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata
tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc
agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat
ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta
ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt
aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa
ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac
cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat
cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta
cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat
atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta
cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg
ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta
atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac
cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag
caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact
tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct
cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt
tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat
caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg
ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca
gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat
tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt
tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa
ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa
caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa
acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa
tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa
aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg
aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa
taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa
aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat
agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta
ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt
aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga
gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt
gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg
gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa
aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg
ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg
gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg
attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa
ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat
atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta
ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac
ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt
tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg
ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct
tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca
gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca
aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac
gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa
agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat
ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt
atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat
atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta
taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa
tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat
tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac
caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca
atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat
attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat
ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt
tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct
ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt
taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta
tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata
catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata
taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat
ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta
tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct
tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa
ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa
atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg
gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat
tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa
aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa
aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt
aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta
ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct
ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga
tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact
tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta
atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct
aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca
agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt
tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga
cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc
cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa
- 文件 2:描述基因特征的 feature table 文件(.tbl)
该文件可以用 prokka 对文件 1 进行注释而得到,但是需要自己加以修改,加上文件前几行以及 gene 相关的信息,各列之间用制表符分隔。
>Feature Toyoncin_biosynthesis_gene_cluster
1 8409 source
organism Bacillus toyonensis
mol_type genomic DNA
strain XIN-YC13
585 1 gene
gene orf1
585 1 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00001
product MarR family transcriptional regulator
1476 811 gene
gene orf2
1476 811 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00002
product YIP1 family membrane protein
2710 1496 gene
gene orf3
2710 1496 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00003
product ABC transporter permease
3387 2707 gene
gene orf4
3387 2707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00004
product ABC transporter ATP-binding protein
4595 3384 gene
gene orf5
4595 3384 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00005
product RND family efflux transporter, MFP subunit
4746 4952 gene
gene orf6
4746 4952 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00006
product Helix-turn-helix transcriptional regulator
5010 5198 gene
gene orf7
5010 5198 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00007
product Putative membrane protein
5337 5549 gene
gene toyA
5337 5549 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00008
product Toyonsin precusor
5657 6304 gene
gene orf9
5657 6304 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00009
product ABC transporter ATP-binding protein
6349 7707 gene
gene orf10
6349 7707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00010
product Putative membrane protein
8391 7849 gene
gene orf11
8391 7849 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00011
product MarR family transcriptional regulator
- 文件 3:描述作者信息的模板文件(.sbt)
可以在 NCBI 上生成该文件。
Submit-block ::= {
contact {
contact {
name name {
last "xin",
first "bingyue",
middle "",
initials "",
suffix "",
title ""
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
email "xinbingyuex@163.com",
postal-code "235000"
}
}
},
cit {
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
postal-code "235000"
}
}
},
subtype new
}
Seqdesc ::= pub {
pub {
gen {
cit "unpublished",
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
}
},
title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "ALT EMAIL:xinbingyuex@163.com"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "Submission Title:None"
}
}
}
注意:文件 1 和文件 2 的序列描述信息必须一致,此例中均为 “Toyoncin_biosynthesis_gene_cluster”。
文件生成
tbl2asn -t template.sbt -p ./ -V vb -x .fna
-t 模板文件
- p 输入文件所在路径
- V
-v 生成验证文件,保存错误信息
- b 生成 gbf 文件
- x 文件 1(FASTA 文件)的后缀名,根据实际情况填写
注意:如果用 Prokka 带的 tbl2asn,生成的 sqn 和 gbf 文件中的日期通常是 1-JAN-2019,需要自己手动改正为当前时间,这是因为 Prokka 里的 tbl2asn 是经过修改的。建议使用官方版的 tbl2asn,可避免日期错误。