扩增子分析--计算随机过程和决定性过程比例
发表于:2021-03-25 | 分类: 生物信息
字数统计: 2k | 阅读时长: 9分钟 | 阅读量:

确定性过程(deterministic processes)和随机性过程(stochastic processes) 是两个影响微生物群落结构系统发育组装的重要过程,本文介绍计算二者所占比例的方法。

1. 软件

  • NST
  • iCAMP
  • ape 用于读取进化树文件
  • picante

2. 文件准备

2.1 Feature Table

行为 OTUs/ASVs,列为样本。

TaxonIDSample 1Sample 2Sample 3Sample 4
OTU1232.0209.0349.0256.0
OTU275.035.044.00.0
OTU3237.0224.0291.0353.0
OTU4371.080.0319.0345.0

2.2 Group File

该文件描述了所有样本的分组情况,如实验组和对照组,或者其他分组。

Sample_IDGroup
Sample 1Group x
Sample 2Group x
Sample 3Group y
Sample 4Group y
......

2.3 Tree File

包含 Feature table 中所有 OTUs/ASVs 的系统发育树文件,理想条件下仅包含 Feature table 中的 OTUs/ASVs,不过大部分情况下还会包含数据库中的一些物种,在随后的分析中需要去除多余的物种(后续会讲到)。

3. 开始分析

########################
#!/usr/bin/env R
# version 20200919
# Step 1. 文件、路径和参数

# 指定包含输入文件的目录路径,注意区分Windows和Linux的路径写法
wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick"

# 指定结果文件的保存路径
save.wd="/mnt/e/Researches/lujia16S/Analysis_20200907/exported-feature-table_2k_abund22/Raup_Crick/Result2"

# 创建文件保存路径
dir.create(save.wd, recursive = TRUE)

# 指定Feature table(OTU表)的文件名
com.file="feature-table.tsv"

# 指定样本分组文件,每行为一个样本
group.file="treatment.txt"

# 指定NWK格式的系统发育树文件
tree.file="tree.nwk"

# 设置并行运算使用的线程数
nworker=8

# randomization time for null model analysis. 真实分析的时候一般设置为1000,如果测试的话可以设20
rand.time=999

# 输出文件的前缀名,随便设置
prefix="Lujia"

# Step 2. 加载R包

# 确保已经安装过所需的R包
#install.packages("NST") 

library(ape)
library(iCAMP)
library(NST) # need to be NST >=3.0.3
library(picante)

# Step 3. 加载数据并匹配IDs

setwd(wd)

# 读入Feature Table,注意自己文件的列与列之间的分隔符是什么,制表符用sep = "\t",逗号用sep = ","
comm=t(read.table(com.file,header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE))

# 读入分组文件,同样注意设置分隔符
group=read.table(group.file, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE, stringsAsFactors = FALSE)

#如果tree中的OTUs和Feature Table中的OTUs一一对应,可以直接用下面一个命令读入tree(注意去掉###),否则的话则运行下面LHL加入的3行命令
###tree=ape::read.tree(file = tree.file) # if you have tree

# 以下3行是LHL加入的
phy<-read.tree(tree.file)# 读入树文件
prune_tree<-prune.sample(comm,phy)# 使树文件和OTU表文件一一对齐
tree=prune_tree # 此刻的Tree干净了,可用于后续分析而不会报错

# 以下命令检测Feature Table中的样本名称是否与分组文件中的样本名一一对应
samp.ck=NST::match.name(rn.list=list(comm=comm,group=group))
comm=samp.ck$comm
comm=comm[,colSums(comm)>0,drop=FALSE]
group=samp.ck$group

# 以下命令检测Feature Table中的OTUs名称是否与Tree中的OTUs名一一对应
tax.ck=NST::match.name(cn.list = list(comm=comm),tree.list = list(tree=tree)) # if you have tree
comm=tax.ck$comm
tree=tax.ck$tree

# Step 4. Grouping way and metacommunity seting

# 选择分组,如果拥有多种分组方式,每次运行时选择其中的一组。此处选择的时分组文件中的第二列
groupi=group[,1,drop=FALSE]

# 重新定义输出文件的前缀,以分组的名称来命名,此处以分组文件第二列的表头“Group”为前缀
prefixi=paste0(prefix,".Group")

# if treatment and control are from different metacommunities, you may set meta.groupi=groupi,默认为NULL
#meta.groupi=NULL
meta.groupi=groupi

# Step 5. taxonomic NST
# Step 5.1 calculate tNST

# 指定计算距离矩阵的方法,"jaccard" and "bray" are preferred.
dist.method="bray"

# 记录运行时间
t1=Sys.time()

# 进入输出目录
setwd(save.wd)

# 计算tNST
tnst=tNST(comm=comm, group=groupi, meta.group=meta.groupi, meta.com=NULL, dist.method=dist.method, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, nworker=nworker, LB=FALSE, null.model="PF", between.group=TRUE, SES=TRUE, RC=TRUE)

# 以R data格式保存tNST的输出 
save(tnst, file = paste0(prefixi, ".tNST.rda"))

# 保存其他tNST结果到多个文件中
write.table(tnst$index.grp, file = paste0(prefixi, ".tNST.summary.csv"), quote = FALSE, sep = "\t")

write.table(tnst$index.pair.grp,file = paste0(prefixi,".tNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair,file = paste0(prefixi,".tNST.pairwise.index.csv"),quote = FALSE,sep = "\t")
write.table(tnst$index.between,file = paste0(prefixi,".tNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(tnst$index.pair.between,file = paste0(prefixi,".tNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 5.2 Bootstrapping test

t1=Sys.time()

# 计算Bootstrapping
tnstbt=nst.boot(nst.result=tnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

# 保存结果
save(tnstbt,file = paste0(prefixi,".tNST.boot.rda"))

# 保存结果
write.table(tnstbt$summary,file = paste0(prefixi,".tNST.boot.summary.csv"), quote = FALSE,sep = "\t")
write.table(tnstbt$compare,file = paste0(prefixi,".tNST.boot.compare.csv"), quote = FALSE,sep = "\t")
(t=format(Sys.time()-t1))

# Step 5.3 PERMANOVA

t1=Sys.time()

tnstpaov=nst.panova(nst.result=tnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(tnstpaov,file = paste0(prefixi,".tNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Steo 6. phylogenetic NST

# Step 6.1 phylogentic distance matrix # use bigmemory for big dataset

wd.pd=paste0(save.wd,"/pdbig")

if(!dir.exists(wd.pd)){dir.create(wd.pd)}

setwd(wd.pd)

if(file.exists("pd.desc"))
{
  # if already done before, directly use it.
  pdbig=list()
  pdbig$tip.label=read.table("pd.taxon.name.csv", sep = ",", row.names = 1, header = TRUE, stringsAsFactors = FALSE, as.is = TRUE)[,1]
  pdbig$pd.wd=wd.pd
  pdbig$pd.file="pd.desc"
  pdbig$pd.name.file="pd.taxon.name.csv"
}else{
  pdbig=iCAMP::pdist.big(tree = tree, wd = wd.pd, nworker = nworker)
}

# Step 6.2 calculate pNST

# to count time cost
t1=Sys.time()

setwd(save.wd)

pnst=NST::pNST(comm=comm, pd.desc=pdbig$pd.file, pd.wd=pdbig$pd.wd,
pd.spname=pdbig$tip.label, group=groupi, meta.group=meta.groupi, abundance.weighted=TRUE, rand=rand.time, output.rand=TRUE, taxo.null.model=NULL, phylo.shuffle=TRUE, exclude.conspecifics=FALSE, nworker=nworker, between.group=FALSE, SES=TRUE, RC=TRUE)

# save pNST output in R data format
save(pnst,file = paste0(prefixi,".pNST.rda"))

write.table(pnst$index.grp,file = paste0(prefixi,".pNST.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnst$index.pair.grp,file = paste0(prefixi,".pNST.pairwise.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair,file = paste0(prefixi,".pNST.pairwise.index.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.between,file = paste0(prefixi,".pNST.between.summary.csv"),quote = FALSE,sep = "\t")

write.table(pnst$index.pair.between,file = paste0(prefixi,".pNST.pairwise.between.csv"),quote = FALSE,sep = "\t")

format(Sys.time()-t1)

# Step 6.3 Bootstrapping test

t1=Sys.time()

pnstbt=nst.boot(nst.result=pnst, group=groupi, rand=rand.time, trace=TRUE, two.tail=FALSE, out.detail=TRUE, between.group=FALSE, nworker=nworker)

save(pnstbt,file = paste0(prefixi,".pNST.boot.rda"))

write.table(pnstbt$summary,file = paste0(prefixi,".pNST.boot.summary.csv"), quote = FALSE,sep = "\t")

write.table(pnstbt$compare,file = paste0(prefixi,".pNST.boot.compare.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

# Step 6.4 PERMANOVA

t1=Sys.time()

pnstpaov=nst.panova(nst.result=pnst, group = groupi, rand = rand.time, trace = TRUE)

write.table(pnstpaov,file = paste0(prefixi,".pNST.PERMANOVA.csv"), quote = FALSE,sep = "\t")

(t=format(Sys.time()-t1))

4. 结果解读

4.1 确定性过程和随机性过程的相对重要性

  • An index, normalized stochasticity ratio (NST), was developed with 50% as the boundary point between more deterministic (<50%) and more stochastic (>50%) assembly. NST > 50% 时 Stochastic Processes 占主导,而 NST < 50% 时,Deterministic Processes 占主导。

  • 用显著的偏差 (即 |β NTI|> 2) 来表示确定性过程占主导地位和用小的偏差 (即 |β NTI| < 2) 来表明随机过程占主导地位。用显著的偏差 (即 |β NTI| > 2) 来表示确定性过程占主导地位和用小的偏差 (即 |β NTI| < 2) 来表明随机过程占主导地位。同质性和变异性选择应分别造成小于和大于预期的群落更替;βNTI <−2 或 > + 2 进一步解释为表明同质性或变异性选择分别占主导地位;

4.2 通过 RCbray 判断随机过程中各种过程的贡献

Modified Raup-Crick index (RCbray) is subsequently calculated by comparing empirically observed Bray-Curtis and simulated null distribution. Thus, according to themodified Raup-Crick index (RCbray), stochastic processes could be classified into 3 ecological processes: 均质分散 homogenizing dispersal (RCbray < −0.95), 分散限制 dispersal limitation (RCbray > +0.95) and 生态漂变 ecological drift (−0.95< RCbray < +0.95) Stegen et al., 2012; Stegen et al., 2013.

上一篇:
Nature Protocols投稿经验
下一篇:
Use microeco分析扩增子数据
本文目录
本文目录