Feature Detection#
One very common task in mass spectrometry is the detection of 2-dimensional
patterns in m/z and time (RT) dimension from a series of MS1 scans. These
patterns are called Features
and they exhibit a chromatographic elution
profile in the time dimension and an isotopic pattern in the m/z dimension (see
previous section for the 1-dimensional problem).
OpenMS has multiple tools that can identify these features in 2-dimensional
data, these tools are called FeatureFinder
. Currently the following
FeatureFinders are available in pyOpenMS:
FeatureFinderMultiplexAlgorithm
(e.g., SILAC, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
FeatureFinderAlgorithmPicked
(Label-free, identification free feature detection of peptides)
FeatureFinderIdentificationAlgorithm
(Label-free identification-guided feature detection of peptides)
FeatureFinderAlgorithmIsotopeWavelet
(old instruments)
FeatureFindingMetabo
(Label-free, identification free feature detection of metabolites)
FeatureFinderAlgorithmMetaboIdent
(Label-free, identification guided feature detection of metabolites)
All of the algorithms above are for proteomics data with the exception of FeatureFindingMetabo
and FeatureFinderMetaboIdentCompound
for metabolomics data and small molecules in general.
Proteomics#
Two of the most commonly used feature finders for proteomics in OpenMS are the FeatureFinder
and FeatureFinderIdentificationAlgorithm
which both work on (high
resolution) centroided data. We can use the following code to find features in MS data:
from urllib.request import urlretrieve
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
urlretrieve(
gh + "/src/data/FeatureFinderCentroided_1_input.mzML", "feature_test.mzML"
)
import pyopenms as oms
# Prepare data loading (save memory by only
# loading MS1 spectra into memory)
options = oms.PeakFileOptions()
options.setMSLevels([1])
fh = oms.MzMLFile()
fh.setOptions(options)
# Load data
input_map = oms.MSExperiment()
fh.load("feature_test.mzML", input_map)
input_map.updateRanges()
ff = oms.FeatureFinder()
ff.setLogType(oms.LogType.CMD)
# Run the feature finder
name = "centroided"
features = oms.FeatureMap()
seeds = oms.FeatureMap()
params = oms.FeatureFinder().getParameters(name)
ff.run(name, input_map, features, params, seeds)
features.setUniqueIds()
fh = oms.FeatureXMLFile()
fh.store("output.featureXML", features)
print("Found", features.size(), "features")
With a few lines of Python, we are able to run powerful algorithms available in
OpenMS. The resulting data is held in memory (a FeatureMap
object) and can be
inspected directly using the help(features)
comment. It reveals that the
object supports iteration (through the __iter__
function) as well as direct
access (through the __getitem__
function). This means we write code that uses direct access and iteration in
Python as follows:
f0 = features[0]
for f in features:
print(f.getRT(), f.getMZ())
Each entry in the FeatureMap
is a so-called Feature
and allows direct
access to the m/z and RT value from Python. Again, we can learn this by
inspecting help(f)
or by consulting the manual.
Note: the output file that we have written (output.featureXML
) is an
OpenMS-internal XML format for storing features. You can learn more about file
formats in the Reading MS data formats section.
Metabolomics - Untargeted#
For the untargeted detection of small molecule features we can use the FeatureFindingMetabo
with prior MassTraceDetection
and ElutionPeakDetection
.
import pyopenms as oms
from urllib.request import urlretrieve
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
mzML_path = gh + "/src/data/FeatureFinderMetaboIdent_1_input.mzML"
urlretrieve(mzML_path, "ms_data.mzML")
exp = oms.MSExperiment()
oms.MzMLFile().load("ms_data.mzML", exp)
exp.sortSpectra(True)
mass_traces = []
mtd = oms.MassTraceDetection()
mtd_params = mtd.getDefaults()
mtd_params.setValue(
"mass_error_ppm", 5.0
) # set according to your instrument mass error
mtd_params.setValue(
"noise_threshold_int", 3000.0
) # adjust to noise level in your data
mtd.setParameters(mtd_params)
mtd.run(exp, mass_traces, 0)
mass_traces_split = []
mass_traces_final = []
epd = oms.ElutionPeakDetection()
epd_params = epd.getDefaults()
epd_params.setValue("width_filtering", "fixed")
epd.setParameters(epd_params)
epd.detectPeaks(mass_traces, mass_traces_split)
if epd.getParameters().getValue("width_filtering") == "auto":
epd.filterByPeakWidth(mass_traces_split, mass_traces_final)
else:
mass_traces_final = mass_traces_split
fm = oms.FeatureMap()
feat_chrom = []
ffm = oms.FeatureFindingMetabo()
ffm_params = ffm.getDefaults()
ffm_params.setValue("isotope_filtering_model", "none")
ffm_params.setValue(
"remove_single_traces", "true"
) # set false to keep features with only one mass trace
ffm_params.setValue("mz_scoring_by_elements", "false")
ffm_params.setValue("report_convex_hulls", "true")
ffm.setParameters(ffm_params)
ffm.run(mass_traces_final, fm, feat_chrom)
fm.setUniqueIds()
fm.setPrimaryMSRunPath(["ms_data.mzML".encode()])
Metabolomics - Targeted#
FeatureFinderAlgorithmMetaboIdent
performs MS1-based targeted feature extraction based on user provided compounds, which are
specified in an assay library (a tab-separated text file). Detected features are stored in a FeatureMap
which can be
stored in a FeatureXMLFile
. This tool is useful for the targeted extraction of features for a well-defined set of compounds
with known sum formulas and retention times.
For more information on the format of the assay library and available parameters visit the FeatureFinderMetaboIdent documentation.
The pyOpenMS FeatureFinderAlgorithmMetaboIdent
needs a list of FeatureFinderMetaboIdentCompound
objects as an assay libray for it’s
run()
function. We could create that list ourselves or use the following function to read an assay library as .tsv
file:
CompoundName |
SumFormula |
Mass |
Charge |
RetentionTime |
RetentionTimeRange |
IsoDistribution |
---|---|---|---|---|---|---|
2’-O-methylcytidine |
C10H15N3O5 |
0 |
1 |
207.6 |
0 |
0 |
5-formylcytidine |
C10O6N3H13 |
0 |
1 |
269.4 |
0 |
0 |
5-methyluridine |
C10H14N2O6 |
0 |
1 |
291.6 |
0 |
0 |
adenosine |
C10H13N5O4 |
0 |
1 |
220.8 |
0 |
0 |
deoxyadenosine |
C10H13N5O3 |
0 |
1 |
243.0 |
0 |
0 |
inosine |
C10H12N4O5 |
0 |
1 |
264.0 |
0 |
0 |
import csv
# read tsv file and create list of FeatureFinderMetaboIdentCompound
def metaboTableFromFile(path_to_library_file):
metaboTable = []
with open(path_to_library_file, "r") as tsv_file:
tsv_reader = csv.reader(tsv_file, delimiter="\t")
next(tsv_reader) # skip header
for row in tsv_reader:
metaboTable.append(
oms.FeatureFinderMetaboIdentCompound(
row[0], # name
row[1], # sum formula
float(row[2]), # mass
[int(charge) for charge in row[3].split(",")], # charges
[float(rt) for rt in row[4].split(",")], # RTs
[
float(rt_range) for rt_range in row[5].split(",")
], # RT ranges
[
float(iso_distrib) for iso_distrib in row[6].split(",")
], # isotope distributions
)
)
return metaboTable
Now we can use the following code to detect features with FeatureFinderAlgorithmMetaboIdent
and store them in a FeatureXMLFile
:
from urllib.request import urlretrieve
import pyopenms as oms
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
mzML_path = gh + "/src/data/FeatureFinderMetaboIdent_1_input.mzML"
urlretrieve(mzML_path, "ms_data.mzML")
urlretrieve(
gh + "/src/data/FeatureFinderMetaboIdent_1_input.tsv", "library.tsv"
)
# load ms data from mzML file into MSExperiment
spectra = oms.MSExperiment()
oms.MzMLFile().load("ms_data.mzML", spectra)
# create FeatureFinderAlgorithmMetaboIdent and assign ms data
ff = oms.FeatureFinderAlgorithmMetaboIdent()
ff.setMSData(spectra)
# read library generate a metabo table with compounds
metabo_table = metaboTableFromFile("library.tsv")
# FeatureMap to store results
fm = oms.FeatureMap()
# edit some parameters
params = ff.getParameters()
params[b"extract:mz_window"] = 5.0 # 5 ppm
params[b"extract:rt_window"] = 20.0 # 20 seconds
params[b"detect:peak_width"] = 3.0 # 3 seconds
ff.setParameters(params)
# run the FeatureFinderMetaboIdent with the metabo_table and mzML file path -> store results in fm
ff.run(metabo_table, fm, mzML_path)
# save FeatureMap to file
oms.FeatureXMLFile().store("detected_features.featureXML", fm)
Note: the output file that we have written (output.featureXML
) is an
OpenMS-internal XML format for storing features. You can learn more about file
formats in the Reading MS data formats section.
We can get a quick overview on the detected features by plotting them using the following function:
1import matplotlib.pyplot as plt
2
3def plotDetectedFeatures3D(path_to_featureXML):
4 fm = oms.FeatureMap()
5 fh = oms.FeatureXMLFile()
6 fh.load(path_to_featureXML, fm)
7
8 fig = plt.figure()
9 ax = fig.add_subplot(111, projection="3d")
10
11 for feature in fm:
12 color = next(ax._get_lines.prop_cycler)["color"]
13 # chromatogram data is stored in the subordinates of the feature
14 for i, sub in enumerate(feature.getSubordinates()):
15 retention_times = [
16 x[0] for x in sub.getConvexHulls()[0].getHullPoints()
17 ]
18 intensities = [
19 int(y[1]) for y in sub.getConvexHulls()[0].getHullPoints()
20 ]
21 mz = sub.getMetaValue("MZ")
22 ax.plot(retention_times, intensities, zs=mz, zdir="x", color=color)
23 if i == 0:
24 ax.text(
25 mz,
26 retention_times[0],
27 max(intensities) * 1.02,
28 feature.getMetaValue("label"),
29 color=color,
30 )
31
32 ax.set_ylabel("time (s)")
33 ax.set_xlabel("m/z")
34 ax.set_zlabel("intensity (cps)")
35 plt.show()
