确定性过程(deterministic processes)和随机性过程(stochastic processes) 是两个影响微生物群落结构系统发育组装的重要过程,本文介绍计算二者所占比例的方法。
1. 软件
- NST
- iCAMP
- ape 用于读取进化树文件
- picante
2. 文件准备
2.1 Feature Table
行为 OTUs/ASVs,列为样本。
TaxonID | Sample 1 | Sample 2 | Sample 3 | Sample 4 |
---|---|---|---|---|
OTU1 | 232.0 | 209.0 | 349.0 | 256.0 |
OTU2 | 75.0 | 35.0 | 44.0 | 0.0 |
OTU3 | 237.0 | 224.0 | 291.0 | 353.0 |
OTU4 | 371.0 | 80.0 | 319.0 | 345.0 |
2.2 Group File
该文件描述了所有样本的分组情况,如实验组和对照组,或者其他分组。
Sample_ID | Group |
---|---|
Sample 1 | Group x |
Sample 2 | Group x |
Sample 3 | Group y |
Sample 4 | Group y |
... | ... |
2.3 Tree File
包含 Feature table 中所有 OTUs/ASVs 的系统发育树文件,理想条件下仅包含 Feature table 中的 OTUs/ASVs,不过大部分情况下还会包含数据库中的一些物种,在随后的分析中需要去除多余的物种(后续会讲到)。
3. 开始分析
########################
#!/usr/bin/env R
# version 20200919
# Step 1. 文件、路径和参数
# 指定包含输入文件的目录路径,注意区分Windows和Linux的路径写法
wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick"
# 指定结果文件的保存路径
save.wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick/Result2"
# 创建文件保存路径
dir.create(save.wd, recursive = TRUE)
# 指定Feature table(OTU表)的文件名
com.file="feature-table.tsv"
# 指定样本分组文件,每行为一个样本
group.file="treatment.txt"
# 指定NWK格式的系统发育树文件
tree.file="tree.nwk"
# 设置并行运算使用的线程数
nworker=8
# randomization time for null model analysis. 真实分析的时候一般设置为1000,如果测试的话可以设20
rand.time=999
# 输出文件的前缀名,随便设置
prefix="Lujia"
# Step 2. 加载R包
# 确保已经安装过所需的R包
#install.packages("NST")
library(ape)
library(iCAMP)
library(NST) # need to be NST >=3.0.3
library(picante)
# Step 3. 加载数据并匹配IDs
setwd(wd)
# 读入Feature Table,注意自己文件的列与列之间的分隔符是什么,制表符用sep = "\t",逗号用sep = ","
comm=t(read.table(com.file,header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE))
# 读入分组文件,同样注意设置分隔符
group=read.table(group.file, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE)
#如果tree中的OTUs和Feature Table中的OTUs一一对应,可以直接用下面一个命令读入tree(注意去掉###),否则的话则运行下面LHL加入的3行命令
###tree=ape::read.tree(file = tree.file) # if you have tree
# 以下3行是LHL加入的
phy<-read.tree(tree.file)# 读入树文件
prune_tree<-prune.sample(comm,phy)# 使树文件和OTU表文件一一对齐
tree=prune_tree # 此刻的Tree干净了,可用于后续分析而不会报错
# 以下命令检测Feature Table中的样本名称是否与分组文件中的样本名一一对应
samp.ck=NST::match.name(rn.list=list(comm=comm,group=group))
comm=samp.ck$comm
comm=comm[,colSums(comm)>0,drop=FALSE]
group=samp.ck$group
# 以下命令检测Feature Table中的OTUs名称是否与Tree中的OTUs名一一对应
tax.ck=NST::match.name(cn.list = list(comm=comm),tree.list = list(tree=tree)) # if you have tree
comm=tax.ck$comm
tree=tax.ck$tree
# Step 4. Grouping way and metacommunity seting
# 选择分组,如果拥有多种分组方式,每次运行时选择其中的一组。此处选择的时分组文件中的第二列
groupi=group[,1,drop=FALSE]
# 重新定义输出文件的前缀,以分组的名称来命名,此处以分组文件第二列的表头“Group”为前缀
prefixi=paste0(prefix,".Group")
# if treatment and control are from different metacommunities, you may set meta.groupi=groupi,默认为NULL
#meta.groupi=NULL
meta.groupi=groupi
# Step 5. taxonomic NST
# Step 5.1 calculate tNST
# 指定计算距离矩阵的方法,"jaccard" and "bray" are preferred.
dist.method="bray"
# 记录运行时间
t1=Sys.time()
# 进入输出目录
setwd(save.wd)
# 计算tNST
tnst=tNST(comm=comm, group=groupi, meta.group=meta.groupi, meta.com=NULL, dist.method=dist.method, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, nworker=nworker, LB=FALSE, null.model="PF", between.group=TRUE, SES=TRUE, RC=TRUE)
# 以R data格式保存tNST的输出
save(tnst, file = paste0(prefixi, ".tNST.rda"))
# 保存其他tNST结果到多个文件中
write.table(tnst$index.grp, file = paste0(prefixi, ".tNST.summary.csv"), quote = FALSE, sep = "\t")
write.table(tnst$index.pair.grp,file = paste0(prefixi,".tNST.pairwise.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.pair,file = paste0(prefixi,".tNST.pairwise.index.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.between,file = paste0(prefixi,".tNST.between.summary.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.pair.between,file = paste0(prefixi,".tNST.pairwise.between.csv"),quote = FALSE,sep = "\t")
format(Sys.time()-t1)
# Step 5.2 Bootstrapping test
t1=Sys.time()
# 计算Bootstrapping
tnstbt=nst.boot(nst.result=tnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)
# 保存结果
save(tnstbt,file = paste0(prefixi,".tNST.boot.rda"))
# 保存结果
write.table(tnstbt$summary,file = paste0(prefixi,".tNST.boot.summary.csv"), quote = FALSE,sep = "\t")
write.table(tnstbt$compare,file = paste0(prefixi,".tNST.boot.compare.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))
# Step 5.3 PERMANOVA
t1=Sys.time()
tnstpaov=nst.panova(nst.result=tnst, group = groupi, rand = rand.time, trace = TRUE)
write.table(tnstpaov,file = paste0(prefixi,".tNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))
# Steo 6. phylogenetic NST
# Step 6.1 phylogentic distance matrix # use bigmemory for big dataset
wd.pd=paste0(save.wd,"/pdbig")
if(!dir.exists(wd.pd)){dir.create(wd.pd)}
setwd(wd.pd)
if(file.exists("pd.desc"))
{
# if already done before, directly use it.
pdbig=list()
pdbig$tip.label=read.table("pd.taxon.name.csv", sep = ",", row.names = 1, header = TRUE, stringsAsFactors = FALSE, as.is = TRUE)[,1]
pdbig$pd.wd=wd.pd
pdbig$pd.file="pd.desc"
pdbig$pd.name.file="pd.taxon.name.csv"
}else{
pdbig=iCAMP::pdist.big(tree = tree, wd = wd.pd, nworker = nworker)
}
# Step 6.2 calculate pNST
# to count time cost
t1=Sys.time()
setwd(save.wd)
pnst=NST::pNST(comm=comm, pd.desc=pdbig$pd.file, pd.wd=pdbig$pd.wd,
pd.spname=pdbig$tip.label, group=groupi, meta.group=meta.groupi, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, taxo.null.model=NULL, phylo.shuffle=TRUE, exclude.conspecifics=FALSE, nworker=nworker, between.group=FALSE, SES=TRUE, RC=TRUE)
# save pNST output in R data format
save(pnst,file = paste0(prefixi,".pNST.rda"))
write.table(pnst$index.grp,file = paste0(prefixi,".pNST.summary.csv"), quote = FALSE,sep = "\t")
write.table(pnst$index.pair.grp,file = paste0(prefixi,".pNST.pairwise.csv"),quote = FALSE,sep = "\t")
write.table(pnst$index.pair,file = paste0(prefixi,".pNST.pairwise.index.csv"),quote = FALSE,sep = "\t")
write.table(pnst$index.between,file = paste0(prefixi,".pNST.between.summary.csv"),quote = FALSE,sep = "\t")
write.table(pnst$index.pair.between,file = paste0(prefixi,".pNST.pairwise.between.csv"),quote = FALSE,sep = "\t")
format(Sys.time()-t1)
# Step 6.3 Bootstrapping test
t1=Sys.time()
pnstbt=nst.boot(nst.result=pnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)
save(pnstbt,file = paste0(prefixi,".pNST.boot.rda"))
write.table(pnstbt$summary,file = paste0(prefixi,".pNST.boot.summary.csv"), quote = FALSE,sep = "\t")
write.table(pnstbt$compare,file = paste0(prefixi,".pNST.boot.compare.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))
# Step 6.4 PERMANOVA
t1=Sys.time()
pnstpaov=nst.panova(nst.result=pnst, group = groupi, rand = rand.time, trace = TRUE)
write.table(pnstpaov,file = paste0(prefixi,".pNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))
4. 结果解读
4.1 确定性过程和随机性过程的相对重要性
An index, normalized stochasticity ratio (NST), was developed with 50% as the boundary point between more deterministic (<50%) and more stochastic (>50%) assembly. NST > 50% 时 Stochastic Processes 占主导,而 NST < 50% 时,Deterministic Processes 占主导。
用显著的偏差 (即 |β NTI|> 2) 来表示确定性过程占主导地位和用小的偏差 (即 |β NTI| < 2) 来表明随机过程占主导地位。用显著的偏差 (即 |β NTI| > 2) 来表示确定性过程占主导地位和用小的偏差 (即 |β NTI| < 2) 来表明随机过程占主导地位。同质性和变异性选择应分别造成小于和大于预期的群落更替;βNTI <−2 或 > + 2 进一步解释为表明同质性或变异性选择分别占主导地位;
4.2 通过 RCbray 判断随机过程中各种过程的贡献
Modified Raup-Crick index (RCbray) is subsequently calculated by comparing empirically observed Bray-Curtis and simulated null distribution. Thus, according to themodified Raup-Crick index (RCbray), stochastic processes could be classified into 3 ecological processes: 均质分散 homogenizing dispersal (RCbray < −0.95), 分散限制 dispersal limitation (RCbray > +0.95) and 生态漂变 ecological drift (−0.95< RCbray < +0.95) Stegen et al., 2012; Stegen et al., 2013.