扩增子分析--计算随机过程和决定性过程比例
发表于:2021-03-25 | 分类: 生物信息
字数统计: 2.1k | 阅读时长: 10分钟 | 阅读量:

确定性过程(deterministic processes)和随机性过程(stochastic processes)是两个影响微生物群落结构系统发育组装的重要过程,本文介绍计算二者所占比例的方法。

1. 软件

  • NST
  • iCAMP
  • ape 用于读取进化树文件
  • picante

2. 文件准备

2.1 Feature Table

行为OTUs/ASVs,列为样本。

TaxonID Sample 1 Sample 2 Sample 3 Sample 4
OTU1 232.0 209.0 349.0 256.0
OTU2 75.0 35.0 44.0 0.0
OTU3 237.0 224.0 291.0 353.0
OTU4 371.0 80.0 319.0 345.0

2.2 Group File

该文件描述了所有样本的分组情况,如实验组和对照组,或者其他分组。

Sample_ID Group
Sample 1 Group x
Sample 2 Group x
Sample 3 Group y
Sample 4 Group y

2.3 Tree File

包含Feature table中所有OTUs/ASVs的系统发育树文件,理想条件下仅包含Feature table中的OTUs/ASVs,不过大部分情况下还会包含数据库中的一些物种,在随后的分析中需要去除多余的物种(后续会讲到)。

3. 开始分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
########################
#!/usr/bin/env R
# version 20200919
# Step 1. 文件、路径和参数

# 指定包含输入文件的目录路径,注意区分Windows和Linux的路径写法
wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick"

# 指定结果文件的保存路径
save.wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick/Result2"

# 创建文件保存路径
dir.create(save.wd, recursive = TRUE)

# 指定Feature table(OTU表)的文件名
com.file="feature-table.tsv"

# 指定样本分组文件,每行为一个样本
group.file="treatment.txt"

# 指定NWK格式的系统发育树文件
tree.file="tree.nwk"

# 设置并行运算使用的线程数
nworker=8

# randomization time for null model analysis. 真实分析的时候一般设置为1000,如果测试的话可以设20
rand.time=999

# 输出文件的前缀名,随便设置
prefix="Lujia"

# Step 2. 加载R包

# 确保已经安装过所需的R包
#install.packages("NST")

library(ape)
library(iCAMP)
library(NST) # need to be NST >=3.0.3
library(picante)

# Step 3. 加载数据并匹配IDs

setwd(wd)

# 读入Feature Table,注意自己文件的列与列之间的分隔符是什么,制表符用sep = "\t",逗号用sep = ","
comm=t(read.table(com.file,header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE))

# 读入分组文件,同样注意设置分隔符
group=read.table(group.file, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE)

#如果tree中的OTUs和Feature Table中的OTUs一一对应,可以直接用下面一个命令读入tree(注意去掉###),否则的话则运行下面LHL加入的3行命令
###tree=ape::read.tree(file = tree.file) # if you have tree

# 以下3行是LHL加入的
phy<-read.tree(tree.file)# 读入树文件
prune_tree<-prune.sample(comm,phy)# 使树文件和OTU表文件一一对齐
tree=prune_tree # 此刻的Tree干净了,可用于后续分析而不会报错

# 以下命令检测Feature Table中的样本名称是否与分组文件中的样本名一一对应
samp.ck=NST::match.name(rn.list=list(comm=comm,group=group))
comm=samp.ck$comm
comm=comm[,colSums(comm)>0,drop=FALSE]
group=samp.ck$group

# 以下命令检测Feature Table中的OTUs名称是否与Tree中的OTUs名一一对应
tax.ck=NST::match.name(cn.list = list(comm=comm),tree.list = list(tree=tree)) # if you have tree
comm=tax.ck$comm
tree=tax.ck$tree

# Step 4. Grouping way and metacommunity seting

# 选择分组,如果拥有多种分组方式,每次运行时选择其中的一组。此处选择的时分组文件中的第二列
groupi=group[,1,drop=FALSE]

# 重新定义输出文件的前缀,以分组的名称来命名,此处以分组文件第二列的表头“Group”为前缀
prefixi=paste0(prefix,".Group")

# if treatment and control are from different metacommunities, you may set meta.groupi=groupi,默认为NULL
#meta.groupi=NULL
meta.groupi=groupi

# Step 5. taxonomic NST
# Step 5.1 calculate tNST

# 指定计算距离矩阵的方法,"jaccard" and "bray" are preferred.
dist.method="bray"

# 记录运行时间
t1=Sys.time()

# 进入输出目录
setwd(save.wd)

# 计算tNST
tnst=tNST(comm=comm, group=groupi, meta.group=meta.groupi, meta.com=NULL, dist.method=dist.method, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, nworker=nworker, LB=FALSE, null.model="PF", between.group=TRUE, SES=TRUE, RC=TRUE)

# 以R data格式保存tNST的输出
save(tnst, file = paste0(prefixi, ".tNST.rda"))

# 保存其他tNST结果到多个文件中
write.table(tnst$index.grp, file = paste0(prefixi, ".tNST.summary.csv"), quote = FALSE, sep = "\t")

write.table(tnst$index.pair.grp,file = paste0(prefixi,".tNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair,file = paste0(prefixi,".tNST.pairwise.index.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.between,file = paste0(prefixi,".tNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair.between,file = paste0(prefixi,".tNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 5.2 Bootstrapping test

t1=Sys.time()

# 计算Bootstrapping
tnstbt=nst.boot(nst.result=tnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

# 保存结果
save(tnstbt,file = paste0(prefixi,".tNST.boot.rda"))

# 保存结果
write.table(tnstbt$summary,file = paste0(prefixi,".tNST.boot.summary.csv"), quote = FALSE,sep = "\t")
write.table(tnstbt$compare,file = paste0(prefixi,".tNST.boot.compare.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))

# Step 5.3 PERMANOVA

t1=Sys.time()

tnstpaov=nst.panova(nst.result=tnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(tnstpaov,file = paste0(prefixi,".tNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Steo 6. phylogenetic NST

# Step 6.1 phylogentic distance matrix # use bigmemory for big dataset

wd.pd=paste0(save.wd,"/pdbig")

if(!dir.exists(wd.pd)){dir.create(wd.pd)}

setwd(wd.pd)

if(file.exists("pd.desc"))
{
# if already done before, directly use it.
pdbig=list()
pdbig$tip.label=read.table("pd.taxon.name.csv", sep = ",", row.names = 1, header = TRUE, stringsAsFactors = FALSE, as.is = TRUE)[,1]
pdbig$pd.wd=wd.pd
pdbig$pd.file="pd.desc"
pdbig$pd.name.file="pd.taxon.name.csv"
}else{
pdbig=iCAMP::pdist.big(tree = tree, wd = wd.pd, nworker = nworker)
}

# Step 6.2 calculate pNST

# to count time cost
t1=Sys.time()

setwd(save.wd)

pnst=NST::pNST(comm=comm, pd.desc=pdbig$pd.file, pd.wd=pdbig$pd.wd,
pd.spname=pdbig$tip.label, group=groupi, meta.group=meta.groupi, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, taxo.null.model=NULL, phylo.shuffle=TRUE, exclude.conspecifics=FALSE, nworker=nworker, between.group=FALSE, SES=TRUE, RC=TRUE)

# save pNST output in R data format
save(pnst,file = paste0(prefixi,".pNST.rda"))

write.table(pnst$index.grp,file = paste0(prefixi,".pNST.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnst$index.pair.grp,file = paste0(prefixi,".pNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair,file = paste0(prefixi,".pNST.pairwise.index.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.between,file = paste0(prefixi,".pNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair.between,file = paste0(prefixi,".pNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 6.3 Bootstrapping test

t1=Sys.time()

pnstbt=nst.boot(nst.result=pnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

save(pnstbt,file = paste0(prefixi,".pNST.boot.rda"))

write.table(pnstbt$summary,file = paste0(prefixi,".pNST.boot.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnstbt$compare,file = paste0(prefixi,".pNST.boot.compare.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Step 6.4 PERMANOVA

t1=Sys.time()

pnstpaov=nst.panova(nst.result=pnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(pnstpaov,file = paste0(prefixi,".pNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

4. 结果解读

4.1 确定性过程和随机性过程的相对重要性

  • An index, normalized stochasticity ratio (NST), was developed with 50% as the boundary point between more deterministic (<50%) and more stochastic (>50%) assembly. NST > 50%时Stochastic Processes占主导,而NST < 50%时,Deterministic Processes占主导。

  • 用显著的偏差(即|β NTI| > 2)来表示确定性过程占主导地位和用小的偏差(即|β NTI| < 2)来表明随机过程占主导地位。用显著的偏差(即|β NTI| > 2)来表示确定性过程占主导地位和用小的偏差(即|β NTI| < 2)来表明随机过程占主导地位。同质性和变异性选择应分别造成小于和大于预期的群落更替;βNTI <−2或> + 2进一步解释为表明同质性或变异性选择分别占主导地位;

4.2 通过RCbray判断随机过程中各种过程的贡献

Modified Raup-Crick index (RCbray) is subsequently calculated by comparing empirically observed Bray-Curtis and simulated null distribution. Thus, according to themodified Raup-Crick index (RCbray), stochastic processes could be classified into 3 ecological processes: 均质分散homogenizing dispersal (RCbray < −0.95), 分散限制dispersal limitation (RCbray > +0.95) and 生态漂变ecological drift (−0.95< RCbray < +0.95) Stegen et al., 2012; Stegen et al., 2013.

上一篇:
Nature Protocols投稿经验
下一篇:
Use microeco分析扩增子数据