「MetMiner」数据清洗 - 去噪

数据清洗的第一步是根据统计样本中代谢物的缺失率(missing value rate)来标记噪音。MetMiner中标记噪音的具体方式较为多样化...

原理

之前提到过理论上QC样本包含了所有的代谢物信息,而且从实验操作也可以看出,不同时间段的QC样本都来自于一个混样,所以理论上所有QC样本中代谢物的缺失率是较为一致的,且缺失率较低

经验上,我们认为在QC样本中缺失率低于20%的代谢物是较为稳定的,这些代谢物检出率较高,可信度也较高,针对大部分的实验设计,该评价条件式较为稳健的;

此外针对无样本异质性的实验设计,我们还可以结合对测试样本中代谢物的缺失率来判断,例如,在对QC样本缺失率筛选过后的feature,我们也可以加上更加严格的条件,比如在测试样本中至少一般以上的样本要检测到,这里就可以针对筛选;

对于有样本异质性的实验设计,只需要关注他在QC样本中的缺失率即可;

注意,QC中缺失阈值和样本中缺失阈值之间的关系是and!而不是or!可以理解为在符合QC阈值的情况下增加更加严格的条件!

操作界面

noisy remove

首先看sidebar中的参数:

  • Group by: 比如下面sample information表格中,class将样本分为了QC和Subject,group代表了样本的分组,这里面包含了两个变量,一个是基因型(col0 or mutant)另一个是处理(water or drought),这样将样本分成了4组。如果我们想在控制QC缺失的同时至少在一半的测试样本中存在,那么这里需要将Group by 选择为class,同时missing value frequency选择50%; 如果我们想更加严格一点,至少在5个生物学重复中至少3个可以检测到,那么这里需要将Group by选择为group, missing value frequency选择40%;

  • Missing value frequency 样本缺失率阈值,如果不卡测试样本的缺失阈值,这里直接拉到100%

  • Missing value frequency in QC QC中的缺失率阈值;

sample_id injection.order class group batch group treatment genotype
QC_01 1 QC QC 1 QC
QC_02 2 QC QC 1 QC
QC_03 3 QC QC 1 QC
QC_04 4 QC QC 1 QC
QC_05 5 QC QC 1 QC
QC_06 6 QC QC 1 QC
S_0001 7 Subject WT-Water 1 WT-Water-1 Water col0
S_0002 8 Subject WT-Water 1 WT-Water-2 Water col0
S_0003 9 Subject WT-Water 1 WT-Water-3 Water col0
S_0004 10 Subject WT-Water 1 WT-Water-4 Water col0
S_0005 11 Subject WT-Water 1 WT-Water-5 Water col0
S_0006 12 Subject WT-drought 1 WT-drought-1 drought col0
S_0007 13 Subject WT-drought 1 WT-drought-2 drought col0
S_0008 14 Subject WT-drought 1 WT-drought-3 drought col0
S_0009 15 Subject WT-drought 1 WT-drought-4 drought col0
S_0010 16 Subject WT-drought 1 WT-drought-5 drought col0
QC_07 17 QC QC 1 QC
S_0011 18 Subject MT-Water 1 MT-Water-1 Water mutant
S_0012 19 Subject MT-Water 1 MT-Water-2 Water mutant
S_0013 20 Subject MT-Water 1 MT-Water-3 Water mutant
S_0014 21 Subject MT-Water 1 MT-Water-4 Water mutant
S_0015 22 Subject MT-Water 1 MT-Water-5 Water mutant
S_0016 23 Subject MT-drought 1 MT-drought-1 drought mutant
S_0017 24 Subject MT-drought 1 MT-drought-2 drought mutant
S_0018 25 Subject MT-drought 1 MT-drought-3 drought mutant
S_0019 26 Subject MT-drought 1 MT-drought-4 drought mutant
S_0020 27 Subject MT-drought 1 MT-drought-5 drought mutant
QC_08 28 QC QC 1 QC
QC_09 29 QC QC 1 QC
QC_10 30 QC QC 1 QC
QC_11 31 QC QC 1 QC

---The end---

Jul 27, 2024 by Shawn Wang, HENU, Kaifeng, Henan, China.