数据清洗的第一步是根据统计样本中代谢物的缺失率(missing value rate)来标记噪音。MetMiner中标记噪音的具体方式较为多样化...
原理
之前提到过理论上QC
样本包含了所有的代谢物信息,而且从实验操作也可以看出,不同时间段的QC
样本都来自于一个混样,所以理论上所有QC样本中代谢物的缺失率是较为一致的,且缺失率较低;
经验上,我们认为在QC
样本中缺失率低于20%
的代谢物是较为稳定的,这些代谢物检出率较高,可信度也较高,针对大部分的实验设计,该评价条件式较为稳健的;
此外针对无样本异质性的实验设计,我们还可以结合对测试样本中代谢物的缺失率来判断,例如,在对QC
样本缺失率筛选过后的feature,我们也可以加上更加严格的条件,比如在测试样本中至少一般以上的样本要检测到,这里就可以针对筛选;
对于有样本异质性的实验设计,只需要关注他在QC
样本中的缺失率即可;
注意,QC中缺失阈值和样本中缺失阈值之间的关系是and!而不是or!可以理解为在符合QC阈值的情况下增加更加严格的条件!
操作界面

首先看sidebar中的参数:
-
Group by: 比如下面sample information表格中,class将样本分为了QC和Subject,group代表了样本的分组,这里面包含了两个变量,一个是基因型(col0 or mutant)另一个是处理(water or drought),这样将样本分成了4组。如果我们想在控制QC缺失的同时至少在一半的测试样本中存在,那么这里需要将Group by 选择为
class
,同时missing value frequency
选择50%
; 如果我们想更加严格一点,至少在5个生物学重复中至少3个可以检测到,那么这里需要将Group by选择为group
,missing value frequency
选择40%
; -
Missing value frequency 样本缺失率阈值,如果不卡测试样本的缺失阈值,这里直接拉到
100%
; -
Missing value frequency in QC
QC
中的缺失率阈值;
sample_id | injection.order | class | group | batch | group | treatment | genotype |
---|---|---|---|---|---|---|---|
QC_01 | 1 | QC | QC | 1 | QC | ||
QC_02 | 2 | QC | QC | 1 | QC | ||
QC_03 | 3 | QC | QC | 1 | QC | ||
QC_04 | 4 | QC | QC | 1 | QC | ||
QC_05 | 5 | QC | QC | 1 | QC | ||
QC_06 | 6 | QC | QC | 1 | QC | ||
S_0001 | 7 | Subject | WT-Water | 1 | WT-Water-1 | Water | col0 |
S_0002 | 8 | Subject | WT-Water | 1 | WT-Water-2 | Water | col0 |
S_0003 | 9 | Subject | WT-Water | 1 | WT-Water-3 | Water | col0 |
S_0004 | 10 | Subject | WT-Water | 1 | WT-Water-4 | Water | col0 |
S_0005 | 11 | Subject | WT-Water | 1 | WT-Water-5 | Water | col0 |
S_0006 | 12 | Subject | WT-drought | 1 | WT-drought-1 | drought | col0 |
S_0007 | 13 | Subject | WT-drought | 1 | WT-drought-2 | drought | col0 |
S_0008 | 14 | Subject | WT-drought | 1 | WT-drought-3 | drought | col0 |
S_0009 | 15 | Subject | WT-drought | 1 | WT-drought-4 | drought | col0 |
S_0010 | 16 | Subject | WT-drought | 1 | WT-drought-5 | drought | col0 |
QC_07 | 17 | QC | QC | 1 | QC | ||
S_0011 | 18 | Subject | MT-Water | 1 | MT-Water-1 | Water | mutant |
S_0012 | 19 | Subject | MT-Water | 1 | MT-Water-2 | Water | mutant |
S_0013 | 20 | Subject | MT-Water | 1 | MT-Water-3 | Water | mutant |
S_0014 | 21 | Subject | MT-Water | 1 | MT-Water-4 | Water | mutant |
S_0015 | 22 | Subject | MT-Water | 1 | MT-Water-5 | Water | mutant |
S_0016 | 23 | Subject | MT-drought | 1 | MT-drought-1 | drought | mutant |
S_0017 | 24 | Subject | MT-drought | 1 | MT-drought-2 | drought | mutant |
S_0018 | 25 | Subject | MT-drought | 1 | MT-drought-3 | drought | mutant |
S_0019 | 26 | Subject | MT-drought | 1 | MT-drought-4 | drought | mutant |
S_0020 | 27 | Subject | MT-drought | 1 | MT-drought-5 | drought | mutant |
QC_08 | 28 | QC | QC | 1 | QC | ||
QC_09 | 29 | QC | QC | 1 | QC | ||
QC_10 | 30 | QC | QC | 1 | QC | ||
QC_11 | 31 | QC | QC | 1 | QC |
---The end---
Jul 27, 2024 by Shawn Wang, HENU, Kaifeng, Henan, China.