确定性过程（deterministic processes）和随机性过程（stochastic processes)是两个影响微生物群落结构系统发育组装的重要过程，本文介绍计算二者所占比例的方法。

1. 软件

NST
iCAMP
ape 用于读取进化树文件
picante

2. 文件准备

2.1 Feature Table

行为OTUs/ASVs，列为样本。

TaxonID	Sample 1	Sample 2	Sample 3	Sample 4
OTU1	232.0	209.0	349.0	256.0
OTU2	75.0	35.0	44.0	0.0
OTU3	237.0	224.0	291.0	353.0
OTU4	371.0	80.0	319.0	345.0

2.2 Group File

该文件描述了所有样本的分组情况，如实验组和对照组，或者其他分组。

Sample_ID	Group
Sample 1	Group x
Sample 2	Group x
Sample 3	Group y
Sample 4	Group y
…	…

2.3 Tree File

包含Feature table中所有OTUs/ASVs的系统发育树文件，理想条件下仅包含Feature table中的OTUs/ASVs，不过大部分情况下还会包含数据库中的一些物种，在随后的分析中需要去除多余的物种（后续会讲到）。

3. 开始分析

########################
#!/usr/bin/env R
# version 20200919
# Step 1. 文件、路径和参数

# 指定包含输入文件的目录路径，注意区分Windows和Linux的路径写法
wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick"

# 指定结果文件的保存路径
save.wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick/Result2"

# 创建文件保存路径
dir.create(save.wd, recursive = TRUE)

# 指定Feature table（OTU表）的文件名
com.file="feature-table.tsv"

# 指定样本分组文件，每行为一个样本
group.file="treatment.txt"

# 指定NWK格式的系统发育树文件
tree.file="tree.nwk"

# 设置并行运算使用的线程数
nworker=8

# randomization time for null model analysis. 真实分析的时候一般设置为1000，如果测试的话可以设20
rand.time=999

# 输出文件的前缀名，随便设置
prefix="Lujia"

# Step 2. 加载R包

# 确保已经安装过所需的R包
#install.packages("NST") 

library(ape)
library(iCAMP)
library(NST) # need to be NST >=3.0.3
library(picante)

# Step 3. 加载数据并匹配IDs

setwd(wd)

# 读入Feature Table，注意自己文件的列与列之间的分隔符是什么，制表符用sep = "\t"，逗号用sep = ","
comm=t(read.table(com.file,header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE))

# 读入分组文件，同样注意设置分隔符
group=read.table(group.file, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE)

#如果tree中的OTUs和Feature Table中的OTUs一一对应，可以直接用下面一个命令读入tree（注意去掉###），否则的话则运行下面LHL加入的3行命令
###tree=ape::read.tree(file = tree.file) # if you have tree

# 以下3行是LHL加入的
phy<-read.tree(tree.file)# 读入树文件
prune_tree<-prune.sample(comm,phy)# 使树文件和OTU表文件一一对齐
tree=prune_tree # 此刻的Tree干净了，可用于后续分析而不会报错

# 以下命令检测Feature Table中的样本名称是否与分组文件中的样本名一一对应
samp.ck=NST::match.name(rn.list=list(comm=comm,group=group))
comm=samp.ck$comm
comm=comm[,colSums(comm)>0,drop=FALSE]
group=samp.ck$group

# 以下命令检测Feature Table中的OTUs名称是否与Tree中的OTUs名一一对应
tax.ck=NST::match.name(cn.list = list(comm=comm),tree.list = list(tree=tree)) # if you have tree
comm=tax.ck$comm
tree=tax.ck$tree

# Step 4. Grouping way and metacommunity seting

# 选择分组，如果拥有多种分组方式，每次运行时选择其中的一组。此处选择的时分组文件中的第二列
groupi=group[,1,drop=FALSE]

# 重新定义输出文件的前缀，以分组的名称来命名，此处以分组文件第二列的表头“Group”为前缀
prefixi=paste0(prefix,".Group")

# if treatment and control are from different metacommunities, you may set meta.groupi=groupi，默认为NULL
#meta.groupi=NULL
meta.groupi=groupi

# Step 5. taxonomic NST
# Step 5.1 calculate tNST

# 指定计算距离矩阵的方法，"jaccard" and "bray" are preferred.
dist.method="bray"

# 记录运行时间
t1=Sys.time()

# 进入输出目录
setwd(save.wd)

# 计算tNST
tnst=tNST(comm=comm, group=groupi, meta.group=meta.groupi, meta.com=NULL, dist.method=dist.method, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, nworker=nworker, LB=FALSE, null.model="PF", between.group=TRUE, SES=TRUE, RC=TRUE)

# 以R data格式保存tNST的输出 
save(tnst, file = paste0(prefixi, ".tNST.rda"))

# 保存其他tNST结果到多个文件中
write.table(tnst$index.grp, file = paste0(prefixi, ".tNST.summary.csv"), quote = FALSE, sep = "\t")

write.table(tnst$index.pair.grp,file = paste0(prefixi,".tNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair,file = paste0(prefixi,".tNST.pairwise.index.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.between,file = paste0(prefixi,".tNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair.between,file = paste0(prefixi,".tNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 5.2 Bootstrapping test

t1=Sys.time()

# 计算Bootstrapping
tnstbt=nst.boot(nst.result=tnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

# 保存结果
save(tnstbt,file = paste0(prefixi,".tNST.boot.rda"))

# 保存结果
write.table(tnstbt$summary,file = paste0(prefixi,".tNST.boot.summary.csv"), quote = FALSE,sep = "\t")
write.table(tnstbt$compare,file = paste0(prefixi,".tNST.boot.compare.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))

# Step 5.3 PERMANOVA

t1=Sys.time()

tnstpaov=nst.panova(nst.result=tnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(tnstpaov,file = paste0(prefixi,".tNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Steo 6. phylogenetic NST

# Step 6.1 phylogentic distance matrix # use bigmemory for big dataset

wd.pd=paste0(save.wd,"/pdbig")

if(!dir.exists(wd.pd)){dir.create(wd.pd)}

setwd(wd.pd)

if(file.exists("pd.desc"))
{
  # if already done before, directly use it.
  pdbig=list()
  pdbig$tip.label=read.table("pd.taxon.name.csv", sep = ",", row.names = 1, header = TRUE, stringsAsFactors = FALSE, as.is = TRUE)[,1]
  pdbig$pd.wd=wd.pd
  pdbig$pd.file="pd.desc"
  pdbig$pd.name.file="pd.taxon.name.csv"
}else{
  pdbig=iCAMP::pdist.big(tree = tree, wd = wd.pd, nworker = nworker)
}

# Step 6.2 calculate pNST

# to count time cost
t1=Sys.time()

setwd(save.wd)

pnst=NST::pNST(comm=comm, pd.desc=pdbig$pd.file, pd.wd=pdbig$pd.wd,
pd.spname=pdbig$tip.label, group=groupi, meta.group=meta.groupi, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, taxo.null.model=NULL, phylo.shuffle=TRUE, exclude.conspecifics=FALSE, nworker=nworker, between.group=FALSE, SES=TRUE, RC=TRUE)

# save pNST output in R data format
save(pnst,file = paste0(prefixi,".pNST.rda"))

write.table(pnst$index.grp,file = paste0(prefixi,".pNST.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnst$index.pair.grp,file = paste0(prefixi,".pNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair,file = paste0(prefixi,".pNST.pairwise.index.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.between,file = paste0(prefixi,".pNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair.between,file = paste0(prefixi,".pNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 6.3 Bootstrapping test

t1=Sys.time()

pnstbt=nst.boot(nst.result=pnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

save(pnstbt,file = paste0(prefixi,".pNST.boot.rda"))

write.table(pnstbt$summary,file = paste0(prefixi,".pNST.boot.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnstbt$compare,file = paste0(prefixi,".pNST.boot.compare.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Step 6.4 PERMANOVA

t1=Sys.time()

pnstpaov=nst.panova(nst.result=pnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(pnstpaov,file = paste0(prefixi,".pNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

4. 结果解读

4.1 确定性过程和随机性过程的相对重要性

An index, normalized stochasticity ratio (NST), was developed with 50% as the boundary point between more deterministic (<50%) and more stochastic (>50%) assembly. NST > 50%时Stochastic Processes占主导，而NST < 50%时，Deterministic Processes占主导。
用显著的偏差(即|β NTI| > 2)来表示确定性过程占主导地位和用小的偏差(即|β NTI| < 2)来表明随机过程占主导地位。用显著的偏差(即|β NTI| > 2)来表示确定性过程占主导地位和用小的偏差(即|β NTI| < 2)来表明随机过程占主导地位。同质性和变异性选择应分别造成小于和大于预期的群落更替;βNTI <−2或> + 2进一步解释为表明同质性或变异性选择分别占主导地位;

4.2 通过RCbray判断随机过程中各种过程的贡献

Modified Raup-Crick index (RCbray) is subsequently calculated by comparing empirically observed Bray-Curtis and simulated null distribution. Thus, according to themodified Raup-Crick index (RCbray), stochastic processes could be classified into 3 ecological processes: 均质分散homogenizing dispersal (RCbray < −0.95), 分散限制dispersal limitation (RCbray > +0.95) and 生态漂变ecological drift (−0.95< RCbray < +0.95) Stegen et al., 2012; Stegen et al., 2013.

了尘兰若的小坑

分享微生物生物信息学分析方法，欢迎加入QQ群交流945751012，不接受群内广告！