基于TidyMass的非靶向代谢组学分析
发表于:2023-04-19 | 分类: 生物信息
字数统计: 7.6k | 阅读时长: 37分钟 | 阅读量:

代谢组学常用仪器特点

仪器 特点
GC-MS 易挥发,低极性,热稳定的小分子化合物;需衍生化
LC-MS 具有一定极性的有机化合物;无需衍生化
NMR 无偏性,无损检测;•无需繁琐前处理,便于活体、原位的动态检测
CE-MS 高极性化合物,如核酸,蛋白等
ICP-MS 无机化合物

LC-QTOF原理

Q-TOF全称为四极飞行时间质谱仪(Quadrupole Time-of-Flight Mass Spectrometer)。其基本原理是将样品离子通过四极杆进行质量筛选,然后进入飞行时间质谱器(Time-of-Flight Mass Analyzer,ToF),测定其离子的飞行时间,从而得到样品中离子的质量信息。Q-TOF有正离子模式(positive ion mode, POS)和负离子模式(negative ion mode, NEG)两种电离方式,在检测代谢组时结合使用两种方式可以使代谢物覆盖率更高,检测效果也更好。

Q-TOF的工作过程包含以下步骤:

离子化:样品通过电喷雾或者MALDI等方法被电离成为离子。

生成离子束:离子经过引出阱、加速器、光栅偏转镜等装置,形成一个带电的离子束。

四极杆质量筛选:离子束进入四极杆,通过高频交变电压(RF)和直流电压(DC)的控制,将不同质量的离子分离出来。

飞行时间质谱分析:通过激光或者其他方法对分散的离子束施加助推、聚焦和分析,并且测定其到达检测器所需的时间。不同质量的离子抵达检测器所需要的时间不同,从而可以得到离子的质量信息。

LC-QTOF原理
FAQ:POS和NEG数据的利用。
负离子和正离子是数据采集时的时候的两种模式生成两套数据,一级分析(MS)结果中是提供 POS 和 NEG 两个检测结果列表的,但是在二级分析(MSMS)结果中,我们将鉴定正负离子模式下鉴定到的物质进行了合并,所以二级分析是针对代谢物来做的,不区分正负离子模式。

FAQ: 当正负离子模式下都检测到了某种物质,如何对该物质的结果进行呈现?
会根据匹配参数进行取舍,选用匹配参数更好的模式下的数据在二级结果中进行分析。

数据分析

下机数据格式转换

以下两个软件二选一,Windows下建议使用ProteoWizard,其用法可参考用metid构建代谢组数据库。Linux下建议使用massconverter。

ProteoWizard

将raw data转换为mzXML格式。

massconverter

将raw data转换为mzXML格式。

安装

1
2
3
4
5
6
7
8
if(!require(remotes)){
install.packages("remotes")
}
remotes::install_github("tidymass/massconverter")

# 获取pwiz
library(massconverter)
docker_pull_pwiz()

Debug:如果报错Error in docker_pull_pwiz() : Please install docker first (https://www.docker.com/get-started),则首先确认系统是否已安装docker,如果未安装,请先安装docker;如已经安装,则是因为普通用户无运行docker的权限,将其添加到docker用户组中即可。

1
2
3
4
5
# root模式下运行如下shell命令,将用户lhl添加到docker组中
gpasswd -a lhl docker

# 重启docker生效
service docker restart

设置转换参数

此处调用MSConvert进行格式转换,期间进行了过滤,主要采用了Peak PickingSubset,和Zero Samples方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
parameter =
massconverter::create_msconvert_parameter(
output_format = "mzXML",# "mzXML", "mzML", "mz5", "mgf", "text", "ms1", "cms1", "ms2", "cms2"
binary_encoding_precision = "64",
zlib = TRUE,
write_index = TRUE,
peak_picking_algorithm = "cwt",# "vendor" or "cwt"
vendor_mslevels = c(1, NA),# 当前调用的是cwt,此参数可能无效
cwt_mslevels = c(1, NA),# 如此设置即可
cwt_min_snr = 0.1,# minimum signal-to-noise ratio,设为0.1即可
cwt_min_peak_spacing = 0.1,# minimum peak spacing,设为0.1即可
subset_polarity = "positive",# "any", "positive" or "negative",根据模式选择
subset_scan_number = c(NA, NA),# A two numeric vector. Can be c(NA, NA) if don't use this
subset_scan_time = c(60, 300),# A two numeric vector. Can be c(NA, NA) if don't use this
subset_mslevels = c(1, 2),# A two numeric vector. Second can be set as NA
zero_samples_mode = "removeExtra",# "no", "removeExtra", or "addMissing"
zero_samples_mslevels = c(1, NA),# A two numeric vector. Second can be set as NA
zero_samples_add_missing_flanking_zero_count = 5# 默认5即可
)

开始转换

1
2
3
4
5
convert_raw_data(input_path = "demo_data/raw_data", 
output_path = "demo_data/mzxml",
msconvert_parameter = parameter,
docker_parameters = c(),
process_all = FALSE)

安装tidyMass

1
2
mamba install gcc_linux-64 gxx_linux-64 gfortran_linux-64
mamba install bioconductor-msnbase bioconductor-rdisop r-openxlsx bioconductor-mzr bioconductor-xcms r-rvest r-tidyr r-stringi r-tidyversedyve r-hexbin
1
2
3
4
5
if(!require(remotes)){
install.packages("remotes")
}
remotes::install_github("tidymass/masstools")
remotes::install_github("tidymass/tidymass")

原始数据处理

调用massprocesser包进行原始数据处理,进行峰检测与对齐,输出peak table包括:
原始数据导入(read spectra from file)、峰检测(detect mass traces & detect chromatographic peaks)、校正保留时间(Correcting rentention time)、对已鉴定的色谱峰进行保留时间校正、Grouping peaks across samples、输出peak table。

数据目录结构

  • mzxml/mzxml_ms1_data/
    • POS/
      • Case/*.mzXML
      • Control/*.mzXML
      • QC/*.mzXML
    • NEG/
      • Case/*.mzXML
      • Control/*.mzXML
      • QC/*.mzXML
  • mzxml/mgf_ms2_data/mgf_ms2_data/
    • POS/*.mgf
    • NEG/*.mgf
  • mzxml/sample_info/
    • sample_info_pos.csv
    • sample_info_neg.csv

正离子模式

注意:path指定的路径名不可以数字开头!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
library(tidymass)
process_data(
path = "mzxml_ms1_data/POS",# 路径根据实际情况定
polarity = "positive",
ppm = 10,
peakwidth = c(10, 60),
threads = 30,
output_tic = TRUE,
output_bpc = TRUE,
output_rt_correction_plot = TRUE,
min_fraction = 0.5,
group_for_figure = "QC",
snthresh = 10,
prefilter = c(3, 500),
fitgauss = FALSE,
integrate = 2,
mzdiff = 0.01,
noise = 500,
binSize = 0.025,
bw = 5,
fill_peaks = FALSE
)
  • ppm:numeric defining the maximal tolerated m/z deviation in consecutive scans in parts per million (ppm) for the initial regions-of-interest (ROI) definition。此处的ppm来源于xcms包,但在xcms中不同的函数中,含义貌似不一样,需进一步确认。
  • peakwith:峰宽,取决于柱子(column)和LC系统。numeric(2) with the expected approximate peak width in chromatographic space. Given as a range (min, max) in seconds.
  • min_fraction:如果一个峰出现在至少一组样本中min_fraction样本以上(按比例),则将保留该峰值。
  • snthresh :信噪比阈值。
  • prefilter:c(k, I) 在第一次分析步骤(ROI检测)中指定预筛选步骤。仅保留包含至少强度>=I的k个峰的的质量轨迹(Mass traces)。
  • fitgauss :是否应对每个峰拟合高斯分布。主要影响峰的保留时间位置。
  • integrate:积分方法(1或2)。对于 integrate=1,Peak limits通过在mexican hat 过滤后的数据上下降(descent)来确定;而对于 integrate=2,则在原始数据上进行下降处理。后一种方法更准确但容易受到噪声的影响,而前一种方法更稳健但不太精确。
  • mzdiff :representing the minimum difference in m/z dimension required for peaks with overlapping retention times; can be negative to allow overlap. During peak post-processing, peaks defined to be overlapping are reduced to the one peak with the largest signal.
  • noise:设定第一步分析中需要的质心(centroids )最小强度(intensity < noise的质心将被省略在感兴趣区域检测之外)。
  • binSize :将m/z轴按照binSize划分为多个区间,再对各区间内的离子信号进行聚合。较小的binSize使得峰检测更加准确,检测到更多features,需要更多计算资源。
  • bw:带宽,通常在5-50间,这是用于峰值检测的核密度估计 (KDE) 算法中使用的参数。bw 控制估计的 KDE 曲线的平滑度,并确定峰的解析程度。较小的bw值将产生更详细的基础峰形表示,在紧密间隔或重叠的峰之间具有更好的分辨率,但也可能产生更高水平的噪声。相反,较大的bw值将导致对峰形的估计更平滑、更广义,这有助于降低噪声和杂散检测,但在某些情况下也可能导致峰分辨率降低。bw 值的选择将取决于所分析数据的具体特征以及分析的目标。通常,有必要试验一系列 bw 值,以找到给定数据集的最佳设置。
  • fill_peaks:Fill peaks NA or not。

结果

  • POS/Result/

    • object:用于下一步分析

    • Peak_table.csv:用于下一步分析的峰表

    • Peak_table_for_cleaning.csv:删除了Peak_table.csv中无用的列,可直接用于后续的数据清洗

    • BPC.pdf:Base peak chromatogram,仅展示group_for_figure参数指定组内的样本
      BPC.pdf

    • TIC.pdf:Total ion peak chromatogram,经色谱分离流出的组分不断进入质谱,质谱连续扫描采集数据,每一次扫描得到一张质谱图,将每一张质谱图中所有离子强度相加得到总离子流强度。然后以总离子强度为纵坐标,时间为横坐标绘图。
      TIC.pdf

    • RT correction plot.pdf
      RT correction plot.pdf

    • intermediate_data:中间文件目录,如果需要重新运行data processing,则需先删除该目录。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # 加载对象
    load("mzxml_ms1_data/POS/Result/object")

    # 查看metabolic features数量
    object

    # 获取互动图,在Rstudio中才能显示
    load("mzxml_ms1_data/POS/Result/intermediate_data/xdata2")

    plot = massprocesser::plot_adjusted_rt(object = xdata2, group_for_figure = "QC", interactive = TRUE)

    plot

负离子模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
massprocesser::process_data(
path = "mzxml_ms1_data/NEG",
polarity = "negative",
ppm = 10,
peakwidth = c(10, 60),
threads = 30,
output_tic = TRUE,
output_bpc = TRUE,
output_rt_correction_plot = TRUE,
min_fraction = 0.5,
group_for_figure = "QC",
snthresh = 10,
prefilter = c(3, 500),
fitgauss = FALSE,
integrate = 2,
mzdiff = 0.01,
noise = 500,
binSize = 0.025,
bw = 5,
fill_peaks = FALSE
)
1
2
3
4
5
# 加载对象
load("mzxml_ms1_data/NEG/Result/object")

# 查看metabolic features数量
object

Explore data

将peak table和样本信息(元数据)整合,得到mass_dataset class对象。获取数据的总结信息。

样本信息表示例 sample_info_pos.csv

1
library(tidyverse)

正离子模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 加载对象
load("mzxml_ms1_data/POS/Result/object")
object_pos <- object
object_pos

# 读入样本信息
sample_info_pos <- readr::read_csv("sample_info/sample_info_pos.csv")

# 查看object_pos中的元数据
object_pos %>% extract_sample_info() %>% head()
#> sample_id group class injection.order
#> 1 sample_06 Case Subject 1
#> 2 sample_103 Case Subject 2
#> 3 sample_11 Case Subject 3
#> 4 sample_112 Case Subject 4
#> 5 sample_117 Case Subject 5
#> 6 sample_12 Case Subject 6
#?? 为何还没整合的情况下object_pos中就存在了部分元数据列?

# 移除object_pos中的"group", "class", "injection.order"
object_pos <- object_pos %>% activate_mass_dataset(what = "sample_info") %>% dplyr::select(-c("group", "class", "injection.order"))

# 将sample_info_pos 中的所有列整合到object_pos 中
object_pos = object_pos %>% activate_mass_dataset(what = "sample_info") %>% left_join(sample_info_pos, by = "sample_id")

# 查看元数据信息
object_pos %>% extract_sample_info() %>% head()
#> sample_id injection.order class batch subject_id group
#> 1 sample_06 4 Subject 1 subject_414 Case
#> 2 sample_103 71 Subject 1 subject_330 Case
#> 3 sample_11 6 Subject 1 subject_125 Case
#> 4 sample_112 78 Subject 1 subject_295 Case
#> 5 sample_117 80 Subject 1 subject_793 Case
#> 6 sample_12 8 Subject 1 subject_387 Case

# 保存数据
dir.create("data_cleaning/POS", showWarnings = FALSE, recursive = TRUE)
save(object_pos, file = "data_cleaning/POS/object_pos")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# 统计样本数和variables数
object_pos

# 根据class统计样本数量,可将class换成group或batch等
object_pos %>% activate_mass_dataset(what = "sample_info") %>% dplyr::count(class)

# 获取peak分布图
pdf(file="data_cleaning/POS/peak_distributation_plot_positive.pdf")

p<- object_pos %>% `+`(1) %>% log(10) %>% show_mz_rt_plot(hex = FALSE)

p+ scale_size_continuous(range = c(0.01, 2))

dev.off()

# 查看总缺失值数量
get_mv_number(object = object_pos)
#[1] 786005

# 查看各样本内的缺失值
get_mv_number(object = object_pos, by = "sample") %>% head()
# sample_06 sample_103 sample_11 sample_112 sample_117 sample_12
# 4017 2713 4064 2981 2920 3846

# 查看各variable的缺失值
get_mv_number(object = object_pos, by = "variable") %>% head()
#M70T73_POS M70T53_POS M70T183_POS M70T527_POS M71T695_POS M71T183_POS
# 129 16 155 54 133 169

# 绘图展示缺失值信息,可在下一节生成
pdf(file="data_cleaning/POS/total_MVs.pdf")

show_missing_values(object = object_pos, show_column_names = TRUE, show_row_names = TRUE, percentage = TRUE)

dev.off()

# 绘图展示各样本缺失值信息,可在下一节生成
pdf(file="data_cleaning/POS/Samples_MVs.pdf")

show_sample_missing_values(object = object_pos, percentage = TRUE, color_by = "class")

dev.off()

# 绘图展示各variables缺失值信息,可在下一节生成
pdf(file="data_cleaning/POS/Variables_MVs.pdf")

p<- show_variable_missing_values(
object = object_pos,
percentage = TRUE,
show_x_text = FALSE,
show_x_ticks = FALSE,
color_by = "mz"
)

p+ scale_size_continuous(range = c(0.01, 1))

dev.off()

峰分布图

各variable的缺失值数量
黑色代表缺失值。

各样本的缺失值比例

各variable的缺失值比例

负离子模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 加载对象
load("mzxml_ms1_data/NEG/Result/object")
object_neg <- object
object_neg

# 读入样本信息
sample_info_neg <- readr::read_csv("sample_info/sample_info_neg.csv")

object_neg %>% extract_sample_info() %>% head()
# sample_id group class injection.order
#1 sample_06 Case Subject 1
#2 sample_103 Case Subject 2
#3 sample_11 Case Subject 3
#4 sample_112 Case Subject 4
#5 sample_117 Case Subject 5
#6 sample_12 Case Subject 6

object_neg <- object_neg %>% activate_mass_dataset(what = "sample_info") %>% dplyr::select(-c("group", "class", "injection.order"))

# 将sample_info_neg添加至object_neg
object_neg = object_neg %>% activate_mass_dataset(what = "sample_info") %>% left_join(sample_info_neg, by = "sample_id")

object_neg %>% extract_sample_info() %>% head()
# sample_id injection.order class batch subject_id group
#1 sample_06 4 Subject 1 subject_414 Case
#2 sample_103 71 Subject 1 subject_330 Case
#3 sample_11 6 Subject 1 subject_125 Case
#4 sample_112 78 Subject 1 subject_295 Case
#5 sample_117 80 Subject 1 subject_793 Case
#6 sample_12 8 Subject 1 subject_387 Case

# 保存数据
dir.create("data_cleaning/NEG", showWarnings = FALSE, recursive = TRUE)
save(object_neg, file = "data_cleaning/NEG/object_neg")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 统计样本数和variables数
object_neg

# 根据class统计样本数量,可将class换成group或batch等
object_neg %>% activate_mass_dataset(what = "sample_info") %>% dplyr::count(class)

# 获取peak分布图
pdf(file="data_cleaning/NEG/peak_distributation_plot_negtive.pdf")

object_neg %>% `+`(1) %>% log(10) %>% show_mz_rt_plot() + scale_size_continuous(range = c(0.01, 2))

dev.off()

# 查看总缺失值数量
get_mv_number(object = object_neg)
#[1] 748237

# 查看各样本内的缺失值
get_mv_number(object = object_neg, by = "sample") %>% head()
# sample_06 sample_103 sample_11 sample_112 sample_117 sample_12
# 3029 2766 3298 3100 2912 3053

# 查看各variable的缺失值
get_mv_number(object = object_neg, by = "variable") %>% head()
#M70T712_NEG M70T527_NEG M70T587_NEG M70T47_NEG M71T587_NEG M71T641_NEG
# 16 137 2 146 41 19

# 绘图展示缺失值信息,可在下一节生成
pdf(file="data_cleaning/NEG/total_MVs.pdf")

show_missing_values(object = object_neg, show_column_names = FALSE, percentage = TRUE)

dev.off()

# 绘图展示各样本缺失值信息,可在下一节生成
pdf(file="data_cleaning/NEG/Samples_MVs.pdf")

show_sample_missing_values(object = object_neg, percentage = TRUE)

dev.off()

# 绘图展示各variables缺失值信息,可在下一节生成
pdf(file="data_cleaning/NEG/Variables_MVs.pdf")

p<- show_variable_missing_values(
object = object_neg,
percentage = TRUE,
show_x_text = FALSE,
show_x_ticks = FALSE
)

p+ scale_size_continuous(range = c(0.01, 1))

dev.off()

数据清洗(Data cleaning)

查看质量

1
2
3
4
5
6
7
8
9
10
11
12
13
# 加载数据
load("data_cleaning/POS/object_pos")
load("data_cleaning/NEG/object_neg")

# 将批次号改为字符串
object_pos <- object_pos %>% activate_mass_dataset(what = "sample_info") %>% dplyr::mutate(batch = as.character(batch))

object_neg <- object_neg %>% activate_mass_dataset(what = "sample_info") %>% dplyr::mutate(batch = as.character(batch))

# 先评估数据质量
massqc::massqc_report(object = object_pos, path = "data_cleaning/POS/data_quality_before_data_cleaning")

massqc::massqc_report(object = object_neg, path = "data_cleaning/NEG/data_quality_before_data_cleaning")

正离子PCA

负离子PCA
从上两图可看出正离子模式下存在严重的批次效应,负离子模式下不存在明显的批次效应。

开始清洗

移除噪音代谢features——缺失值过滤。

移除超过20% QC样本中含有MVs以及至少在50%实验组或对照组样本中含有MVs的variables。

  • 正离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    # 查看各分组样本量
    object_pos %>% activate_mass_dataset(what = "sample_info") %>% dplyr::count(group)
    #> group n
    #> 1 Case 110
    #> 2 Control 110
    #> 3 QC 39

    # QC样本中的MV占比
    pdf(file="data_cleaning/POS/MVpercentQC.pdf")

    p<- show_variable_missing_values(object = object_pos %>% activate_mass_dataset(what = "sample_info") %>% filter(class == "QC"), percentage = TRUE)

    p+ scale_size_continuous(range = c(0.01, 2))

    dev.off()

    # 统计QC中的MV占比
    qc_id = object_pos %>% activate_mass_dataset(what = "sample_info") %>% filter(class == "QC") %>% pull(sample_id)

    # 统计对照组中的MV占比
    control_id = object_pos %>% activate_mass_dataset(what = "sample_info") %>% filter(group == "Control") %>% pull(sample_id)

    # 统计实验组中的MV占比
    case_id = object_pos %>% activate_mass_dataset(what = "sample_info") %>% filter(group == "Case") %>% pull(sample_id)

    # 整合以上统计信息
    object_pos = object_pos %>% mutate_variable_na_freq(according_to_samples = qc_id) %>% mutate_variable_na_freq(according_to_samples = control_id) %>% mutate_variable_na_freq(according_to_samples = case_id)

    head(extract_variable_info(object_pos))
    #variable_id mz rt na_freq na_freq.1 na_freq.2
    #1 M70T73_POS 70.04074 73.31705 0.28205128 0.6000000 0.4727273
    #2 M70T53_POS 70.06596 52.78542 0.00000000 0.1454545 0.0000000
    #3 M70T183_POS 70.19977 182.87981 0.00000000 0.6636364 0.7454545
    #4 M70T527_POS 70.36113 526.76657 0.02564103 0.1818182 0.3000000
    #5 M71T695_POS 70.56911 694.84592 0.02564103 0.6454545 0.5545455
    #6 M71T183_POS 70.75034 182.77790 0.05128205 0.7272727 0.7909091

    # 移除variables
    object_pos <- object_pos %>% activate_mass_dataset(what = "variable_info") %>% filter(na_freq < 0.2 & (na_freq.1 < 0.5 | na_freq.2 < 0.5))

    object_pos

    :na_freq、na_freq.1和na_freq.2在此处分别代表某variables 在QC样本、对照组样本和实验组样本中缺失值MV所占的百分比。

    MVpercentQC

  • 负离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    # 查看各分组样本量
    object_neg %>% activate_mass_dataset(what = "sample_info") %>% dplyr::count(group)
    #> group n
    #> 1 Case 110
    #> 2 Control 110
    #> 3 QC 39

    # QC样本中的MV占比
    pdf(file="data_cleaning/NEG/MVpercentQC.pdf")

    p<- show_variable_missing_values(object = object_neg %>% activate_mass_dataset(what = "sample_info") %>% filter(class == "QC"), percentage = TRUE)

    p+ scale_size_continuous(range = c(0.01, 2))

    dev.off()

    # 统计QC中的MV占比
    qc_id = object_neg %>% activate_mass_dataset(what = "sample_info") %>% filter(class == "QC") %>% pull(sample_id)

    # 统计对照组中的MV占比
    control_id = object_neg %>% activate_mass_dataset(what = "sample_info") %>% filter(group == "Control") %>% pull(sample_id)

    # 统计实验组中的MV占比
    case_id = object_neg %>% activate_mass_dataset(what = "sample_info") %>% filter(group == "Case") %>% pull(sample_id)

    # 整合以上统计信息
    object_neg = object_neg %>% mutate_variable_na_freq(according_to_samples = qc_id) %>% mutate_variable_na_freq(according_to_samples = control_id) %>% mutate_variable_na_freq(according_to_samples = case_id)

    head(extract_variable_info(object_neg))
    #variable_id mz rt na_freq na_freq.1 na_freq.2
    #1 M70T712_NEG 70.05911 711.97894 0.05128205 0.109090909 0.018181818
    #2 M70T527_NEG 70.13902 526.85416 0.33333333 0.509090909 0.618181818
    #3 M70T587_NEG 70.31217 587.25330 0.00000000 0.009090909 0.009090909
    #4 M70T47_NEG 70.33835 46.57537 0.84615385 0.936363636 0.090909091
    #5 M71T587_NEG 70.56315 587.02570 0.17948718 0.145454545 0.163636364
    #6 M71T641_NEG 70.70657 641.16672 0.10256410 0.063636364 0.072727273

    # 移除variables
    object_neg <- object_neg %>% activate_mass_dataset(what = "variable_info") %>% filter(na_freq < 0.2 & (na_freq.1 < 0.5 | na_freq.2 < 0.5))

    object_neg

过滤离群样本(Filter outlier samples)

  • 正离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    # 总览
    pdf(file="data_cleaning/POS/MVpercentALL.pdf")

    massdataset::show_sample_missing_values(object = object_pos, color_by = "group", order_by = "injection.order", percentage = TRUE) + theme(axis.text.x = element_text(size = 2)) + scale_size_continuous(range = c(0.1, 2)) + ggsci::scale_color_aaas()

    dev.off()

    # 检测离群样本
    outlier_samples = object_pos %>% `+`(1) %>% log(2) %>% scale() %>% detect_outlier()

    outlier_samples
    #--------------------
    #masscleaner
    #--------------------
    #1 according_to_na : 0 outlier samples.
    #2 according_to_pc_sd : 0 outlier samples.
    #3 according_to_pc_mad : 0 outlier samples.
    #4 accordint_to_distance : 0 outlier samples.
    #4 accordint_to_distance : 0 outlier samples.
    #extract all outlier table using extract_outlier_table()

    outlier_table <- extract_outlier_table(outlier_samples)

    outlier_table %>% head()
    # according_to_na pc_sd pc_mad accordint_to_distance
    #sample_06 FALSE FALSE FALSE FALSE
    #sample_103 FALSE FALSE FALSE FALSE
    #sample_11 FALSE FALSE FALSE FALSE
    #sample_112 FALSE FALSE FALSE FALSE
    #sample_117 FALSE FALSE FALSE FALSE
    #sample_12 FALSE FALSE FALSE FALSE

    outlier_table %>% apply(1, function(x){ sum(x) }) %>% `>`(0) %>% which()
    # #named integer(0)
    ##无离群样本
    1
    2
    3
    4
    # 删除离群样本,如果有的话
    need_id_pos <- names(outlier_table %>% apply(1, function(x){ sum(x) }) %>% `==`(0) %>% which())

    object_pos <- object_pos %>% activate_mass_dataset(what = "sample_info") %>% filter(sample_id %in% need_id_pos)

    MVpercentALL

  • 负离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    # 总览
    pdf(file="data_cleaning/NEG/MVpercentALL.pdf")

    p<- massdataset::show_sample_missing_values(object = object_neg, color_by = "group", order_by = "injection.order", percentage = TRUE)

    p+ theme(axis.text.x = element_text(size = 2)) + scale_size_continuous(range = c(0.1, 2)) + ggsci::scale_color_aaas()

    dev.off()

    # 检测离群样本
    outlier_samples = object_neg %>% `+`(1) %>% log(2) %>% scale() %>% detect_outlier()

    outlier_samples
    #--------------------
    #masscleaner
    #--------------------
    #1 according_to_na : 0 outlier samples.
    #2 according_to_pc_sd : 0 outlier samples.
    #3 according_to_pc_mad : 0 outlier samples.
    #4 accordint_to_distance : 0 outlier samples.
    #4 accordint_to_distance : 0 outlier samples.
    #extract all outlier table using extract_outlier_table()

    outlier_table <- extract_outlier_table(outlier_samples)

    outlier_table %>% head()
    # according_to_na pc_sd pc_mad accordint_to_distance
    #sample_06 FALSE FALSE FALSE FALSE
    #sample_103 FALSE FALSE FALSE FALSE
    #sample_11 FALSE FALSE FALSE FALSE
    #sample_112 FALSE FALSE FALSE FALSE
    #sample_117 FALSE FALSE FALSE FALSE
    #sample_12 FALSE FALSE FALSE FALSE

    outlier_table %>% apply(1, function(x){ sum(x) }) %>% `>`(0) %>% which()
    # #named integer(0)
    ##无离群样本

缺失值填充(Missing value imputation)

对原始数据中的缺失值进行模拟(missing value recoding)。方法包括:”knn”, “rf” (missForest), “mean”, “median”, “zero”, “minimum”, “bpca”, “svdImpute”, “ppca”。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 获取正离子模式下的MV数量
get_mv_number(object_pos)
#> [1] 148907

# 填充正离子模式缺失值
object_pos <- impute_mv(object = object_pos, method = "knn")

# 获取正离子模式下填充后的MV数量
get_mv_number(object_pos)
#> [1] 0

# 获取负离子模式下的MV数量
get_mv_number(object_neg)
#> [1] 146409

# 填充正离子模式缺失值
object_neg <- impute_mv(object = object_neg, method = "knn")

# 获取正离子模式下填充后的MV数量
get_mv_number(object_neg)
#> [1] 0

数据标准化与整合(Data normalization and integration)

数据标准化处理, 利用内标(internal standard, IS)进行归一化待确认是否通过IS实现

  • 正离子模式

    标准化方法包括:”svr”, “total”, “median”, “mean”, “pqn”, “loess”;整合方法包括:”qc_mean”, “qc_median”, “subject_mean”, “subject_median”。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    object_pos <- normalize_data(object_pos, method = "median")

    object_pos2 <- integrate_data(object_pos, method = "subject_median")

    # 按批次分组绘制PCA图
    pdf(file="data_cleaning/POS/PC_batch_intergrated.pdf")

    object_pos2 %>% `+`(1) %>% log(2) %>% massqc::massqc_pca(color_by = "batch", line = FALSE)

    dev.off()

标准化后正离子PCA

  • 负离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    object_neg <- normalize_data(object_neg, method = "median")

    object_neg2 <- integrate_data(object_neg, method = "subject_median")

    # 按批次分组绘制PCA图
    pdf(file="data_cleaning/NEG/PC_batch_intergrated.pdf")

    object_neg2 %>% `+`(1) %>% log(2) %>% massqc::massqc_pca(color_by = "batch", line = FALSE)

    dev.off()

标准化后负离子PCA

保存数据

1
2
save(object_pos2, file = "data_cleaning/POS/object_pos2")
save(object_neg2, file = "data_cleaning/NEG/object_neg2")

代谢物注释

加载数据

1
2
load("data_cleaning/POS/object_pos2")
load("data_cleaning/NEG/object_neg2")

将MS2数据添加到mass_dataset

如果没有MS2数据,此步不执行应该也可以!
??只有QC样的MS2数据,MS2数据是怎么来的?

  • 正离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    object_pos2 <- mutate_ms2(
    object = object_pos2,
    column = "rp", # rp or hilic,对应RPLC(反相色谱)和HILIC(亲水相互作用色谱)
    polarity = "positive",
    ms1.ms2.match.mz.tol = 15,# ppm
    ms1.ms2.match.rt.tol = 30,# seconds
    path = "mgf_ms2_data/mgf_ms2_data/POS"
    )
    #1043 out of 5101 variable have MS2 spectra.
    #Selecting the most intense MS2 spectrum for each peak...

    # summary
    extract_ms2_data(object_pos2)
  • 负离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    object_neg2 <- mutate_ms2(
    object = object_neg2,
    column = "rp",
    polarity = "negative",
    ms1.ms2.match.mz.tol = 15,
    ms1.ms2.match.rt.tol = 30,
    path = "mgf_ms2_data/mgf_ms2_data/NEG"
    )
    #1092 out of 4104 variable have MS2 spectra.
    #Selecting the most intense MS2 spectrum for each peak...

    # summary
    extract_ms2_data(object_neg2)

代谢物注释

需要考虑数据库是RPLC还是HILIC。

  • 正离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    # Annotate features using snyder_database_rplc0.0.3
    load("metabolite_annotation/snyder_database_rplc0.0.3.rda")

    ## 查看数据库信息
    snyder_database_rplc0.0.3

    ## 注释
    object_pos2 <- annotate_metabolites_mass_dataset(
    object = object_pos2,
    ms1.match.ppm = 15,
    rt.match.tol = 30,
    polarity = "positive",
    database = snyder_database_rplc0.0.3,
    threads =30)

    # Annotate features using orbitrap_database0.0.3
    load("metabolite_annotation/orbitrap_database0.0.3.rda")

    ## 查看数据库信息
    orbitrap_database0.0.3

    ## 注释
    object_pos2 <- annotate_metabolites_mass_dataset(
    object = object_pos2,
    ms1.match.ppm = 15,
    polarity = "positive",
    database = orbitrap_database0.0.3,
    threads =30)

    # Annotate features using mona_database0.0.3
    load("metabolite_annotation/mona_database0.0.3.rda")

    ## 查看数据库信息
    mona_database0.0.3

    ## 注释
    object_pos2 <- annotate_metabolites_mass_dataset(
    object = object_pos2,
    ms1.match.ppm = 15,
    polarity = "positive",
    database = mona_database0.0.3,
    threads =30)

    annotate_metabolites_mass_dataset 参数解析

    • ms1.match.ppm:Precursor match ppm tolerance [25].
    • ms2.match.ppm:Fragment ion match ppm tolerance [30].
    • mz.ppm.thr:Accurate mass tolerance for m/z error calculation [400].
    • ms2.match.tol:MS2 match (MS2 similarity) tolerance [0.5].
    • fraction.weight:The weight for matched fragments [0.3].
    • dp.forward.weight:Forward dot product weight [0.6].
    • dp.reverse.weight:Reverse dot product weight [0.1].
    • remove_fragment_intensity_cutoff:remove_fragment_intensity_cutoff [0].
    • rt.match.tol:RT match tolerance [30].
    • polarity:The polarity of data, “positive”or “negative”.
    • ce:Collision energy. Please confirm the CE values in your database. [all].
    • column:”hilic” (HILIC column) or “rp” (reverse phase).
    • ms1.match.weight:The weight of MS1 match for total score calculation [0.25].
    • rt.match.weight:The weight of RT match for total score calculation [0.25].
    • ms2.match.weight:The weight of MS2 match for total score calculation [0.5]
    • total.score.tol:Total score tolerance. The total score are referring to MS-DIAL [0.5]
    • candidate.num:The number of candidate [3]
  • 负离子模式

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # Annotate features using snyder_database_rplc0.0.3
    object_neg2 <- annotate_metabolites_mass_dataset(
    object = object_neg2,
    ms1.match.ppm = 15,
    rt.match.tol = 30,
    polarity = "negative",
    database = snyder_database_rplc0.0.3)

    # Annotate features using orbitrap_database0.0.3
    object_neg2 <- annotate_metabolites_mass_dataset(
    object = object_neg2,
    ms1.match.ppm = 15,
    polarity = "negative",
    database = orbitrap_database0.0.3)

    # Annotate features using mona_database0.0.3
    object_neg2 <- annotate_metabolites_mass_dataset(
    object = object_neg2,
    ms1.match.ppm = 15,
    polarity = "negative",
    database = mona_database0.0.3)

查看注释结果

1
2
3
4
5
6
7
8
9
head(extract_annotation_table(object = object_pos2))

variable_info_pos <- extract_variable_info(object = object_pos2)

head(variable_info_pos)

table(variable_info_pos$Level)

table(variable_info_pos$Database)

保存数据

1
2
save(object_pos2, file = "metabolite_annotation/object_pos2")
save(object_neg2, file = "metabolite_annotation/object_neg2")

统计分析

加载数据

1
2
3
4
library(tidymass)
library(tidyverse)
load("metabolite_annotation/object_pos2")
load("metabolite_annotation/object_neg2")

移除未被注释的features

  • 正离子模式

    1
    2
    3
    object_pos2 <- object_pos2 %>% activate_mass_dataset(what = "annotation_table") %>% filter(!is.na(Level)) %>% filter(Level == 1 | Level == 2)

    object_pos2
  • 负离子模式

    1
    2
    3
    object_neg2 <- object_neg2 %>% activate_mass_dataset(what = "annotation_table") %>% filter(!is.na(Level)) %>% filter(Level == 1 | Level == 2)

    object_neg2

合并POS和NEG数据

1
2
3
4
5
6
7
8
9
10
11
12
# inner merge for samples and full merge for variables
object <-
merge_mass_dataset(
x = object_pos2,
y = object_neg2,
sample_direction = "inner",# left, right, inner or full,此处用inner较合理
variable_direction = "full",# left, right, inner or full,此处用full合理
sample_by = "sample_id", # merge samples by what columns from sample_info
variable_by = c("variable_id", "mz", "rt")# merge variables by what columns from variable_info
)

object

名词解释:

  • **左连接 (Left join)**:将左表中的所有记录都保留下来,而右表中与左表中记录没有匹配的部分则丢弃。
  • **右连接 (Right join)**:将右表中的所有记录都保留下来,而左表中与右表中记录没有匹配的部分则丢弃。
  • **内连接 (Inner join)**:只有两个表中都存在的记录才保留下来,否则丢弃。
  • **全连接 (Full join)**:将左表和右表中的所有记录都保留下来,如果某个表中的记录没有匹配到另一个表中的记录,则用NULL填充。

Trace processing information of object

1
2
3
dir.create(path = "statistical_analysis", showWarnings = FALSE)

report_parameters(object = object, path = "statistical_analysis/")

基于level和score移除冗余注释代谢物

1
2
3
4
5
6
7
8
9
10
11
12
13
object <- object %>% 
activate_mass_dataset(what = "annotation_table") %>%
group_by(Compound.name) %>%
filter(Level == min(Level)) %>%
filter(SS == max(SS)) %>%
slice_head(n = 1)

object <- object %>%
activate_mass_dataset(what = "annotation_table") %>%
group_by(variable_id) %>%
filter(Level == min(Level)) %>%
filter(SS == max(SS)) %>%
slice_head(n = 1)

为何要选最小的level?SS是什么score?

样本聚类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
load("statistical_analysis/object_final")

# 排除QC样本
temp_object <- object_final %>%
activate_mass_dataset(what = "sample_info") %>%
filter(group != "QC") %>%
`+`(1) %>%
log(2) %>%
scale()

library(ComplexHeatmap)
h1 <- HeatmapAnnotation(class = extract_sample_info(temp_object)$group)

pdf(file="statistical_analysis/Samples_heatmap.pdf")
massstat::Heatmap(matrix = temp_object,
name = "z-score",
row_names_gp = gpar(cex = 0.2),
column_names_gp = gpar(cex = 0.2),
top_annotation = h1)
dev.off()

样本聚类热图

差异表达代谢物

如果有多组数据,需要适当增加相应下述代码。“mutate_fc”的逻辑是每运行一次,则在object中增加一列“fc”,当有多个实验组时,可以将先前生成的fc重命名,因此,object中可以包含多个组间比较的fc结果。

  • 计算变化倍数

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    # 获取对照组样本名列表
    control_sample_id = object %>%
    activate_mass_dataset(what = "sample_info") %>%
    filter(group == "Control") %>%
    pull(sample_id)

    # 获取实验组样本名列表
    case_sample_id = object %>%
    activate_mass_dataset(what = "sample_info") %>%
    filter(group == "Case") %>%
    pull(sample_id)

    #! 如果有多个实验组,参照以上格式在此列出,假设有实验组Case2
    case2_sample_id = object %>%
    activate_mass_dataset(what = "sample_info") %>%
    filter(group == "Case2") %>%
    pull(sample_id)

    # Calculate fold change,每次只能计算两个分组,如果有多个实验组,则依次将其与对照组比较
    object <- mutate_fc(object = object,
    control_sample_id = control_sample_id,
    case_sample_id = case_sample_id,
    mean_median = "mean")#可选"mean", "median"
    #> 110 control samples.
    #> 110 case samples.

    #! 如果有多个实验组,则需要在此更改fc列的默认名,假设将默认的“fc”改为“fc_case1_vs_control”
    object <- object %>%
    activate_mass_dataset(what = "variable_info") %>%
    dplyr::rename(fc_case1_vs_control = fc)

    #! Calculate fold change,每次只能计算两个分组,如果有多个实验组,则依次将其与对照组比较,此处计算Case2与Control
    object <- mutate_fc(object = object,
    control_sample_id = control_sample_id,
    case_sample_id = case2_sample_id,
    mean_median = "mean")
    #> 110 control samples.
    #> 110 case samples.

    #! 如果有多个实验组,则需要在此更改fc列的默认名,假设将默认的“fc”改为“fc_case2_vs_control”
    object <- object %>%
    activate_mass_dataset(what = "variable_info") %>%
    dplyr::rename(fc_case2_vs_control = fc)
  • 计算p值
    同理,当有多个实验组时,也可以分别计算p值,并将默认的列名“p_value”和“p_value_adjust”重命名,以在object中容纳多组相互比较的p值。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    object <- mutate_p_value(
    object = object,
    control_sample_id = control_sample_id,
    case_sample_id = case_sample_id,
    method = "t.test",# "t.test", "wilcox.test"
    p_adjust_methods = "BH"# "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
    )
    #> 110 control samples.
    #> 110 case samples.

    #! 如果涉及多组间的比较,则需重命名默认表头
    object <- object %>%
    activate_mass_dataset(what = "variable_info") %>%
    dplyr::rename(p_value_case1_vs_control = p_value,
    p_value_adjust_case1_vs_control = p_value_adjust)

    #! 假设存在Case2组
    object <- mutate_p_value(
    object = object,
    control_sample_id = control_sample_id,
    case_sample_id = case2_sample_id,
    method = "t.test",
    p_adjust_methods = "BH"
    )
    #> 110 control samples.
    #> 110 case samples.

    #! 如果涉及多组间的比较,则需重命名默认表头
    object <- object %>%
    activate_mass_dataset(what = "variable_info") %>%
    dplyr::rename(p_value_case2_vs_control = p_value,
    p_value_adjust_case2_vs_control = p_value_adjust)
  • 绘制火山图

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
      pdf(file="statistical_analysis/Volcano.pdf")

    p<- volcano_plot(
    object = object,
    fc_column_name = "fc",
    p_value_column_name = "p_value_adjust",
    fc_up_cutoff = 1,
    fc_down_cutoff = 1,
    p_value_cutoff = 0.05,
    add_text = TRUE,
    text_from = "Compound.name",
    point_size_scale = "p_value_adjust"
    )

    p + scale_size_continuous(range = c(0.5, 5))

    dev.off()

    火山图

    假设存在Case2,绘制火山图。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    pdf(file="statistical_analysis/Volcano2.pdf")

    p<- volcano_plot(object = object,
    fc_column_name = "fc_case2_vs_control",# 重命名后
    p_value_column_name = "p_value_adjust_case2_vs_control",# 重命名后
    fc_up_cutoff = 1,
    fc_down_cutoff = 1,
    p_value_cutoff = 0.05,
    add_text = TRUE,
    text_from = "Compound.name",
    point_size_scale = "p_value_adjust_case2_vs_control") # 重命名后

    p + scale_size_continuous(range = c(0.5, 5))

    dev.off()

保存结果

  • 保存差异表达代谢物结果。如果有多个实验组,则将下述代码中的“fc”和“p_value_adjust”更改为重命名后的名称,并分别保存到不同的文件中。

    1
    2
    3
    differential_metabolites <- extract_variable_info(object = object) %>% filter(fc > 2 | fc < 0.5) %>% filter(p_value_adjust < 0.05)

    readr::write_csv(differential_metabolites, file = "statistical_analysis/differential_metabolites.csv")
  • 保存结果对象

    1
    2
    3
    object_final <- object

    save(object_final, file = "statistical_analysis/object_final")

绘制热图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
temp_object <- object_final %>% 
activate_mass_dataset(what = "sample_info") %>%
filter(sample_id %in% c(control_sample_id, case_sample_id)) %>%
activate_mass_dataset(what = "variable_info") %>%
filter(variable_id %in% differential_metabolites$variable_id) %>%
`+`(1) %>%
log(2) %>%
scale()

library(ComplexHeatmap)
h1 <- HeatmapAnnotation(class = extract_sample_info(temp_object)$group)

pdf(file="statistical_analysis/diff_heatmap.pdf")
massstat::Heatmap(matrix = temp_object,
name = "z-score",
row_names_gp = gpar(cex = 0.4),
column_names_gp = gpar(cex = 0.2),
top_annotation = h1,
row_labels = extract_variable_info(temp_object)$Compound.name,
border = TRUE)
dev.off()

显著差异代谢物热图

代谢通路富集

导入数据

1
2
3
library(tidymass)
library(tidyverse)
load("statistical_analysis/object_final")

富集

1
2
3
4
5
6
7
8
dir.create(path = "pathway_enrichment", showWarnings = FALSE)

diff_metabolites <- object_final %>%
activate_mass_dataset(what = "variable_info") %>%
filter(p_value_adjust < 0.05) %>%
extract_variable_info()

head(diff_metabolites)

加载KEGG human pathway数据库

可选数据库:kegg_hsa_pathway,keggMS1database,query_id_kegg,hmdb_pathway,hmdbMS1Database,query_id_hmdb。

??这些数据库的区别是什么?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
data("kegg_hsa_pathway", package = "metpath")

kegg_hsa_pathway

get_pathway_class(kegg_hsa_pathway)

# 移除疾病相关通路!根据课题选择是否移除?
pathway_class = metpath::pathway_class(kegg_hsa_pathway)
head(pathway_class)

remain_idx = pathway_class %>% unlist() %>% stringr::str_detect("Disease") %>% `!`() %>% which()

pathway_database = kegg_hsa_pathway[remain_idx]
pathway_database

kegg_id <- diff_metabolites$KEGG.ID
kegg_id <- kegg_id[!is.na(kegg_id)]

result <- enrich_kegg(
query_id = kegg_id,
query_type = "compound",
id_type = "KEGG",
pathway_database = pathway_database,
p_cutoff = 0.05,
p_adjust_method = "BH",
threads = 10)

result

save(result, file = "pathway_enrichment/result")

绘制结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#  bar plot
pdf(file="pathway_enrichment/barplot.pdf")

enrich_bar_plot(object = result, x_axis = "p_value", cutoff = 0.05)

dev.off()

# scatter plot
pdf(file="pathway_enrichment/scatter_plot.pdf")

enrich_scatter_plot(object = result, y_axis = "p_value", y_axis_cutoff = 0.05)

dev.off()

# network
tiff(file="pathway_enrichment/network.tiff")

enrich_network(object = result, point_size = "p_value", p_cutoff = 0.05, only_significant_pathway = TRUE)

dev.off()

KEGG富集bar plot

KEGG富集scatter plot

KEGG富集network

Correlation network analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
temp_object <- object_final %>% 
activate_mass_dataset(what = "sample_info") %>%
filter(sample_id %in% c(control_sample_id)) %>%
activate_mass_dataset(what = "variable_info") %>%
filter(variable_id %in% diff_metabolites$variable_id) %>%
`+`(1) %>%
log(2) %>%
scale()

library(ggraph)
library(tidygraph)
graph_data <- convert_mass_dataset2graph(
object = temp_object,
margin = "variable",
cor_method = "spearman",
p_adjust_cutoff = 0.05,
p_value_cutoff = 0.05,
pos_cor_cutoff = 0.7,
neg_cor_cutoff = -0.7
) %>%
mutate(Degree = centrality_degree(mode = 'all'))

library(extrafont)
loadfonts()
library(showtext)
showtext_auto()
plot <-
ggraph(graph = graph_data, layout = "kk") +
geom_edge_fan(aes(color = correlation,
#width = -log((p_adjust+0.1), 10)),
show.legend = TRUE)) +
geom_node_point(aes(size = Degree)) +
shadowtext::geom_shadowtext(aes(x = x, y = y,
label = Compound.name),
bg.colour = "white",
colour = "black")+
theme_graph() +
scale_edge_color_gradient2(low = "darkblue", mid = "white", high = "red")

ggsave(plot, filename = "pathway_enrichment/cor_network.pdf", width = 12, height = 7)

代谢物Correlation network

术语

  • HILIC(亲水相互作用色谱)& RPLC(反相色谱):RPLC 色谱柱主要使用非极性固定相(C18、C8 等),而 HILIC 色谱柱则使用极性固定相(二氧化硅、酰胺等)。两种技术采用的流动相通常由乙腈和水组成,这使得两种液相色谱模式之间可以实现轻松切换。HILIC 和 RPLC 流动相的主要区别在于溶剂洗脱强度。对于 RPLC,乙腈是强洗脱溶剂。但对于HILIC,水是强洗脱溶剂。对于 RPLC,得到的色谱图通常是极性分析物到非极性分析物,而 HILIC 则相反。这种相反的洗脱顺序使 HILIC 成为更常用的 RPLC 的一个很好的补充技术。对于极性分析物和离子化分析物尤其如此,它们在 HILIC 模式下的保留时间更长。

参考

加关注

关注公众号“生信之巅”。

生信之巅微信公众号 生信之巅小程序码

敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!Notice: When you use the scripts in this article, please cite the link of this webpage. Thank you!

上一篇:
批量下载某研究方向重要文献
下一篇:
用metid构建代谢组数据库