Feature DetectionΒΆ
One very common task in mass spectrometry is the detection of 2-dimensional
patterns in m/z and time (RT) dimension from a series of MS1 scans. These
patterns are called Features
and they exhibit a chromatographic elution
profile in the time dimension and an isotopic pattern in the m/z dimension (see
previous section for the 1-dimensional problem).
OpenMS has multiple tools that can identify these features in 2-dimensional
data, these tools are called FeatureFinder. Currently the following
FeatureFinders are available in OpenMS:
- FeatureFinderMultiplex
- FeatureFinderMRM
- FeatureFinderCentroided
- FeatureFinderIdentification
- FeatureFinderIsotopeWavelet
- FeatureFinderMetabo
- FeatureFinderSuperHirn
All of the algorithms above are for proteomics data with the exception of
FeatureFinderMetabo which works on metabolomics data. One of the most commonly
used FeatureFinders is the FeatureFinderCentroided which works on (high
resolution) centroided data. We can use the following code to find Features
in MS data:
from urllib.request import urlretrieve
# from urllib import urlretrieve # use this code for Python 2.x
gh = "https://raw.githubusercontent.com/OpenMS/OpenMS/develop"
urlretrieve (gh +"/src/tests/topp/FeatureFinderCentroided_1_input.mzML", "feature_test.mzML")
from pyopenms import *
# Prepare data loading (save memory by only
# loading MS1 spectra into memory)
options = PeakFileOptions()
options.setMSLevels([1])
fh = MzMLFile()
fh.setOptions(options)
# Load data
input_map = MSExperiment()
fh.load("feature_test.mzML", input_map)
input_map.updateRanges()
ff = FeatureFinder()
ff.setLogType(LogType.CMD)
# Run the feature finder
name = "centroided"
features = FeatureMap()
seeds = FeatureMap()
params = FeatureFinder().getParameters(name)
ff.run(name, input_map, features, params, seeds)
features.setUniqueIds()
fh = FeatureXMLFile()
fh.store("output.featureXML", features)
print("Found", features.size(), "features")
With a few lines of Python, we are able to run powerful algorithms available in
OpenMS. The resulting data is held in memory (a FeatureMap
object) and can be
inspected directly using the help(features)
comment. It reveals that the
object supports iteration (through the __iter__
function) as well as direct
access (through the __getitem__
function). We can also inspect the entry
for FeatureMap
in the pyOpenMS manual and learn about the same
functions. This means we write code that uses direct access and iteration in
Python as follows:
f0 = features[0]
for f in features:
print (f.getRT(), f.getMZ())
Each entry in the FeatureMap
is a so-called Feature
and allows direct
access to the m/z and RT value from Python. Again, we can lear this by
inspecting help(f)
or by consulting the Manual.
Note: the output file that we have written (output.featureXML
) is an
OpenMS-internal XML format for storing features. You can learn more about file
formats in the Reading MS data formats section.