NCBI上传基因簇之tbl2asn的使用
发表于:2021-11-12 | 分类: 生物信息
字数统计: 1.7k | 阅读时长: 9分钟 | 阅读量:

NCBI提交基因簇的时候需要提供sqn格式的文件,这个文件需要通过tbl2asn生成。

文件准备

tbl2asn依赖三个文件来生成sqn文件:

  • 文件1:fasta格式的基因组序列文件

Header处的中括号部分可以不写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
>Toyoncin_biosynthesis_gene_cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence
ttaaaa taatttaata
gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca
tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca
ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta
accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt
aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca
cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac
aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc
tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac
atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca
gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc
aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta
caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga
ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat
cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact
gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct
ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat
gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat
agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta
atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa
ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac
atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc
aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact
gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa
ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc
actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc
aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga
catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc
aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag
aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc
aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt
cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt
cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata
tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc
tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc
tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag
ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat
aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt
aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc
tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac
ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat
ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt
catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt
attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac
aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag
tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata
caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct
gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta
aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc
ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca
cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct
tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca
acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata
tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc
agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat
ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta
ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt
aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa
ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac
cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat
cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta
cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat
atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta
cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg
ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta
atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac
cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag
caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact
tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct
cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt
tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat
caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg
ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca
gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat
tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt
tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa
ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa
caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa
acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa
tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa
aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg
aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa
taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa
aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat
agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta
ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt
aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga
gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt
gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg
gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa
aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg
ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg
gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg
attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa
ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat
atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta
ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac
ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt
tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg
ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct
tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca
gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca
aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac
gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa
agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat
ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt
atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat
atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta
taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa
tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat
tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac
caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca
atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat
attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat
ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt
tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct
ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt
taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta
tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata
catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata
taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat
ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta
tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct
tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa
ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa
atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg
gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat
tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa
aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa
aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt
aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta
ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct
ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga
tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact
tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta
atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct
aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca
agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt
tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga
cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc
cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa
  • 文件2:描述基因特征的feature table文件(.tbl)

该文件可以用prokka对文件1进行注释而得到,但是需要自己加以修改,加上文件前几行以及gene相关的信息,各列之间用制表符分隔。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
>Feature Toyoncin_biosynthesis_gene_cluster
1 8409 source
organism Bacillus toyonensis
mol_type genomic DNA
strain XIN-YC13
585 1 gene
gene orf1
585 1 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00001
product MarR family transcriptional regulator
1476 811 gene
gene orf2
1476 811 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00002
product YIP1 family membrane protein
2710 1496 gene
gene orf3
2710 1496 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00003
product ABC transporter permease
3387 2707 gene
gene orf4
3387 2707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00004
product ABC transporter ATP-binding protein
4595 3384 gene
gene orf5
4595 3384 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00005
product RND family efflux transporter, MFP subunit
4746 4952 gene
gene orf6
4746 4952 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00006
product Helix-turn-helix transcriptional regulator
5010 5198 gene
gene orf7
5010 5198 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00007
product Putative membrane protein
5337 5549 gene
gene toyA
5337 5549 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00008
product Toyonsin precusor
5657 6304 gene
gene orf9
5657 6304 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00009
product ABC transporter ATP-binding protein
6349 7707 gene
gene orf10
6349 7707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00010
product Putative membrane protein
8391 7849 gene
gene orf11
8391 7849 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00011
product MarR family transcriptional regulator
  • 文件3:描述作者信息的模板文件(.sbt)

可以在NCBI上生成该文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Submit-block ::= {
contact {
contact {
name name {
last "xin",
first "bingyue",
middle "",
initials "",
suffix "",
title ""
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
email "xinbingyuex@163.com",
postal-code "235000"
}
}
},
cit {
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
postal-code "235000"
}
}
},
subtype new
}
Seqdesc ::= pub {
pub {
gen {
cit "unpublished",
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
}
},
title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "ALT EMAIL:xinbingyuex@163.com"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "Submission Title:None"
}
}
}

注意:文件1和文件2的序列描述信息必须一致,此例中均为“Toyoncin_biosynthesis_gene_cluster”。

文件生成

1
tbl2asn -t template.sbt -p ./ -V vb -x .fna

-t 模板文件
-p 输入文件所在路径
-V
-v 生成验证文件,保存错误信息
-b 生成gbf文件
-x 文件1(FASTA文件)的后缀名,根据实际情况填写

注意:如果用Prokka带的tbl2asn,生成的sqn和gbf文件中的日期通常是1-JAN-2019,需要自己手动改正为当前时间,这是因为Prokka里的tbl2asn是经过修改的。建议使用官方版的tbl2asn,可避免日期错误。

参考

上一篇:
利用NCycDB数据库从宏基因组中预测氮循环基因
下一篇:
R语言安装依赖包错误集锦