MS Data


The most important container for raw data and peaks is MSSpectrum which we have already worked with in the Getting Started tutorial. MSSpectrum is a container for 1-dimensional peak data (a container of Peak1D). You can access these objects directly, however it is faster to use the get_peaks() and set_peaks functions which use Python numpy arrays for raw data access. Meta-data is accessible through inheritance of the SpectrumSettings objects which handles meta data of a spectrum.

In the following example program, a MSSpectrum is filled with peaks, sorted according to mass-to-charge ratio and a selection of peak positions is displayed.

First we create a spectrum and insert peaks with descending mass-to-charge ratios:

 1from pyopenms import *
 2spectrum = MSSpectrum()
 3mz = range(1500, 500, -100)
 4i = [0 for mass in mz]
 5spectrum.set_peaks([mz, i])
 7# Sort the peaks according to ascending mass-to-charge ratio
10# Iterate over spectrum of those peaks
11for p in spectrum:
12  print(p.getMZ(), p.getIntensity())
14# More efficient peak access with get_peaks()
15for mz, i in zip(*spectrum.get_peaks()):
16  print(mz, i)
18# Access a peak by index
19print(spectrum[2].getMZ(), spectrum[2].getIntensity())

Note how lines 11-12 (as well as line 19) use the direct access to the Peak1D objects (explicit iteration through the MSSpectrum object, which is convenient but slow since a new Peak1D object needs to be created each time) while lines 15-16 use the faster access through numpy arrays. Direct iteration is only shown for demonstration purposes and should not be used in production code.

To discover the full set of functionality of MSSpectrum, we use the help() function. In particular, we find several important sets of meta information attached to the spectrum including retention time, the ms level (MS1, MS2, …), precursor ion, ion mobility drift time and extra data arrays.


We now set several of these properties in a current MSSpectrum:

 1# create spectrum and set properties
 2spectrum = MSSpectrum()
 3spectrum.setDriftTime(25) # 25 ms
 4spectrum.setRT(205.2) # 205.2 s
 5spectrum.setMSLevel(3) # MS3
 7# Add peak(s) to spectrum
 8spectrum.set_peaks( ([401.5], [900]) )
10# create precursor information
11p = Precursor()
12p.setMZ(600) # isolation at 600 +/- 1.5 Th
15p.setActivationEnergy(40) # 40 eV
16p.setCharge(4) # 4+ ion
18# and store precursor in spectrum
19spectrum.setPrecursors( [p] )
21# set additional instrument settings (e.g. scan polarity)
22IS = InstrumentSettings()
26# get and check scan polarity
27polarity = spectrum.getInstrumentSettings().getPolarity()
28if (polarity == IonSource.Polarity.POSITIVE):
29  print("scan polarity: positive")
30elif (polarity == IonSource.Polarity.NEGATIVE):
31  print("scan polarity: negative")
33# Optional: additional data arrays / peak annotations
34fda = FloatDataArray()
35fda.setName("Signal to Noise Array")
37sda = StringDataArray()
38sda.setName("Peak annotation")
40spectrum.setFloatDataArrays( [fda] )
41spectrum.setStringDataArrays( [sda] )
43# Add spectrum to MSExperiment
44exp = MSExperiment()
47# Add second spectrum to the MSExperiment
48spectrum2 = MSSpectrum()
49spectrum2.set_peaks( ([1, 2], [1, 2]) )
52# store spectra in mzML file
53MzMLFile().store("testfile.mzML", exp)

We have created a single spectrum and set basic spectrum properties (drift time, retention time, MS level, precursor charge, isolation window and activation energy). Additional instrument settings allow to set e.g. the polarity of the Ion source). We next add actual peaks into the spectrum (a single peak at 401.5 m/z and 900 intensity). Additional metadata can be stored in data arrays for each peak (e.g. use cases care peak annotations or “Signal to Noise” values for each peak. Finally, we add the spectrum to an MSExperiment container to save it using the MzMLFile class in a file called “testfile.mzML”.

You can now open the resulting spectrum in a spectrum viewer. We use the OpenMS viewer TOPPView (which you will get when you install OpenMS from the official website) and look at our spectrum:


TOPPView displays our spectrum with its single peak at 401.5 m/z and it also correctly displays its retention time at 205.2 seconds and precursor isolation target of 600.0 m/z. Notice how TOPPView displays the information about the S/N for the peak (S/N = 15) and its annotation as y15++ in the status bar below when the user clicks on the peak at 401.5 m/z as shown in the screenshot.

We can also visualize our spectrum with matplotlib using the following function:

 import matplotlib.pyplot as plt

 def plot_spectrum(spectrum):
     # plot every peak in spectrum and annotate with it's m/z
     for mz, i in zip(*spectrum.get_peaks()):
         plt.plot([mz, mz], [0, i], color = 'black')
         plt.text(mz, i, str(mz))

     # for the title add RT and Precursor m/z if available
     title = ''
     if spectrum.getRT() >= 0:
         title += 'RT: ' + str(spectrum.getRT())
     if len(spectrum.getPrecursors()) >= 1:
         title += '   Precursor m/z: ' + str(spectrum.getPrecursors()[0].getMZ())


# plotting out spectrum that was defined earlier


An additional container for raw data is the MSChromatogram container, which is highly analogous to the MSSpectrum container, but contains an array of ChromatogramPeak and is derived from ChromatogramSettings:

 1import numpy as np
 3def gaussian(x, mu, sig):
 4    return np.exp(-np.power(x - mu, 2.) / (2 * np.power(sig, 2.)))
 6# Create new chromatogram
 7chromatogram = MSChromatogram()
 9# Set raw data (RT and intensity)
10rt = range(1500, 500, -100)
11i = [gaussian(rtime, 1000, 150) for rtime in rt]
12chromatogram.set_peaks([rt, i])
14# Sort the peaks according to ascending retention time
17# Iterate over chromatogram of those peaks
18for p in chromatogram:
19    print(p.getRT(), p.getIntensity())
21# More efficient peak access with get_peaks()
22for rt, i in zip(*chromatogram.get_peaks()):
23    print(rt, i)
25# Access a peak by index
26print(chromatogram[2].getRT(), chromatogram[2].getIntensity())
28# Add meta information to the chromatogram
29chromatogram.setNativeID("Trace XIC@405.2")
31# Store a precursor ion for the chromatogram
32p = Precursor()
35p.setMZ(405.2) # isolation at 405.2 +/- 1.5 Th
36p.setActivationEnergy(40) # 40 eV
37p.setCharge(2) # 2+ ion
38p.setMetaValue("description", chromatogram.getNativeID())
39p.setMetaValue("peptide_sequence", chromatogram.getNativeID())
42# Also store a product ion for the chromatogram (e.g. for SRM)
43p = Product()
44p.setMZ(603.4) # transition from 405.2 -> 603.4
47# Store as mzML
48exp = MSExperiment()
50MzMLFile().store("testfile3.mzML", exp)
52# Visualize the resulting data using matplotlib
53import matplotlib.pyplot as plt
55for chrom in exp.getChromatograms():
56    retention_times, intensities = chrom.get_peaks()
57    plt.plot(retention_times, intensities, label = chrom.getNativeID())
59plt.xlabel('time (s)')
60plt.ylabel('intensity (cps)')

This shows how the MSExperiment class can hold spectra as well as chromatograms.

Again we can visualize the resulting data using TOPPView using its chromatographic viewer capability, which shows the peak over retention time:


Note how the annotation using precursor and production mass of our XIC chromatogram is displayed in the viewer.

We can also visualize the resulting data using matplotlib. Here we can plot every chromatogram in our MSExperiment and label it with it’s native ID.


LC-MS/MS Experiment

In OpenMS, LC-MS/MS injections are represented as so-called peak maps (using the MSExperiment class), which we have already encountered above. The MSExperiment class can hold a list of MSSpectrum object (as well as a list of MSChromatogram objects, see below). The MSExperiment object holds such peak maps as well as meta-data about the injection. Access to individual spectra is performed through MSExperiment.getSpectrum and MSExperiment.getChromatogram.

In the following code, we create an MSExperiment and populate it with several spectra:

 1# The following examples creates an MSExperiment which holds six
 2# MSSpectrum instances.
 3exp = MSExperiment()
 4for i in range(6):
 5    spectrum = MSSpectrum()
 6    spectrum.setRT(i)
 7    spectrum.setMSLevel(1)
 8    for mz in range(500, 900, 100):
 9      peak = Peak1D()
10      peak.setMZ(mz + i)
11      peak.setIntensity(100 - 25*abs(i-2.5) )
12      spectrum.push_back(peak)
13    exp.addSpectrum(spectrum)
15# Iterate over spectra
16for spectrum in exp:
17    for peak in spectrum:
18        print (spectrum.getRT(), peak.getMZ(), peak.getIntensity())

In the above code, we create six instances of MSSpectrum (line 4), populate it with three peaks at 500, 900 and 100 m/z and append them to the MSExperiment object (line 13). We can easily iterate over the spectra in the whole experiment by using the intuitive iteration on lines 16-18 or we can use list comprehensions to sum up intensities of all spectra that fulfill certain conditions:

# Sum intensity of all spectra between RT 2.0 and 3.0
print(sum([p.getIntensity() for s in exp if s.getRT() >= 2.0 and s.getRT() <= 3.0 for p in s]))
87.5 * 8

We could store the resulting experiment containing the six spectra as mzML using the MzMLFile object:

# Store as mzML
MzMLFile().store("testfile2.mzML", exp)

Again we can visualize the resulting data using TOPPView using its 3D viewer capability, which shows the six scans over retention time where the traces first increase and then decrease in intensity:


Alternatively we can visualize our data directly with Python. For smaller data sets we can use matplotlib to generate a 2D scatter plot with the peak intensities represented by a colorbar. With this plot we can zoom in and inspect our data in more detail.

The following example figures were generated using a mzML file provided by OpenMS.

 import numpy as np
 import matplotlib.pyplot as plt
 import matplotlib.colors as colors

 def plot_spectra_2D(exp, ms_level=1, marker_size = 5):
     print('collecting peak data...')
     for spec in exp:
         if spec.getMSLevel() == ms_level:
             mz, intensity = spec.get_peaks()
             p = intensity.argsort() # sort by intensity to plot highest on top
             rt = np.full([mz.shape[0]], spec.getRT(), float)
             plt.scatter(rt, mz[p], c = intensity[p], cmap = 'afmhot_r', s=marker_size,
                         norm=colors.LogNorm(exp.getMinIntensity()+1, exp.getMaxIntensity()))
     plt.clim(exp.getMinIntensity()+1, exp.getMaxIntensity())
     plt.xlabel('time (s)')
     print('showing plot...') # slow for larger data sets

from urllib.request import urlretrieve

gh = ""
urlretrieve (gh + "/src/data/FeatureFinderMetaboIdent_1_input.mzML", "test.mzML")

exp = MSExperiment()
MzMLFile().load('test.mzML', exp)

_images/Spectra2D.png _images/Spectra2DDetails.png

For larger data sets this will be too slow since every individual peak gets displayed. However, we can use BilinearInterpolation which produces an overview image of our spectra. This can be useful for a brief visual inspection of your sample in quality control.

 import numpy as np
 import matplotlib.pyplot as plt

 def plot_spectra_2D_overview(experiment):
     rows = 200.0
     cols = 200.0

     bilip = BilinearInterpolation()
     tmp = bilip.getData()
     tmp.resize(int(rows), int(cols), float())
     bilip.setMapping_0(0.0, exp.getMinRT(), rows-1, exp.getMaxRT())
     bilip.setMapping_1(0.0, exp.getMinMZ(), cols-1, exp.getMaxMZ())
     print('collecting peak data...')
     for spec in exp:
         if spec.getMSLevel() == 1:
             mzs, ints = spec.get_peaks()
             rt = spec.getRT()
             for i in range(0, len(mzs)):
                 bilip.addValue(rt, mzs[i], ints[i])

     data = np.ndarray(shape=(int(cols), int(rows)), dtype=np.float64)
     for i in range(int(rows)):
         for j in range(int(cols)):
             data[i][j] = bilip.getData().getValue(i,j)

     plt.imshow(np.rot90(data), cmap='gist_heat_r')
     plt.xlabel('retention time (s)')
     plt.xticks(np.linspace(0,int(rows),20, dtype=int),
             np.linspace(exp.getMinRT(),exp.getMaxRT(),20, dtype=int))
     plt.yticks(np.linspace(0,int(cols),20, dtype=int),
             np.linspace(exp.getMinMZ(),exp.getMaxMZ(),20, dtype=int)[::-1])
     print('showing plot...')


Example: Precursor Purity

When an MS/MS spectrum is generated, the precursor from the MS1 spectrum is gathered, fragmented and measured. In practice, the instrument gathers the ions in a user-defined window around the precursor m/z - the so-called precursor isolation window.


In some cases, the precursor isolation window contains contaminant peaks from other analytes. Depending on the analysis requirements, this can lead to issues in quantification for example, for isobaric experiments.

Here, we can assess the purity of the precursor to filter spectra with a score below our expectation.

from urllib.request import urlretrieve

gh = ""
urlretrieve (gh + "/src/data/PrecursorPurity_input.mzML", "PrecursorPurity_input.mzML")

exp = MSExperiment()
MzMLFile().load("PrecursorPurity_input.mzML", exp)

# for this example, we check which are MS2 spectra and choose one of them
for element in exp:

# get the precursor information from the MS2 spectrum at index 3
ms2_precursor = Precursor()
ms2_precursor = exp[3].getPrecursors()[0];

# get the previous recorded MS1 spectrum
isMS1 = False;
i = 3 # start at the index of the MS2 spectrum
while isMS1 == False:
    if exp[i].getMSLevel() == 1:
        isMS1 = True
        i -= 1

ms1_spectrum = exp[i]

# calculate the precursor purity in a 10 ppm precursor isolation window
purity_score = PrecursorPurity().computePrecursorPurity(ms1_spectrum, ms2_precursor, 10, True)

print(purity_score.total_intensity) # 9098343.890625
print(purity_score.target_intensity) # 7057944.0
print(purity_score.signal_proportion) # 0.7757394186070014
print(purity_score.target_peak_count) # 1
print(purity_score.residual_peak_count) # 4

We could assess that we have four other non-isotopic peaks apart from our precursor and its isotope peaks within our precursor isolation window. The signal of the isotopic peaks correspond to roughly 78% of all intensities in the precursor isolation window.

Example: Filtering Spectra

Here we will look at some code snippets that might come in handy when dealing with spectra data.

But first, we will load some test data:

gh = ""
urlretrieve (gh + "/src/data/tiny.mzML", "test.mzML")

inp = MSExperiment()
MzMLFile().load("test.mzML", inp)

Filtering Spectra by MS level

We will filter the data from “test.mzML” file by only retaining only spectra that are not MS1 spectra (e.g.MS2, MS3 or MSn spectra):

1filtered = MSExperiment()
2for s in inp:
3    if s.getMSLevel() > 1:
4        filtered.addSpectrum(s)
6# filtered now only contains spectra with MS level > 2

Filtering by scan number

We could also use a list of scan numbers as filter criterium to only retain a list of MS scans we are interested in:

1scan_nrs = [0, 2, 5, 7]
3filtered = MSExperiment()
4for k, s in enumerate(inp):
5  if k in scan_nrs:
6    filtered.addSpectrum(s)

Filtering Spectra and Peaks

Suppose we are interested in only in a small m/z window of our fragment ion spectra. We can easily filter our data accordingly:

 1mz_start = 6.0
 2mz_end = 12.0
 3filtered = MSExperiment()
 4for s in inp:
 5  if s.getMSLevel() > 1:
 6    filtered_mz = []
 7    filtered_int = []
 8    for mz, i in zip(*s.get_peaks()):
 9      if mz > mz_start and mz < mz_end:
10        filtered_mz.append(mz)
11        filtered_int.append(i)
12    s.set_peaks((filtered_mz, filtered_int))
13    filtered.addSpectrum(s)
15  # filtered only contains only fragment spectra with peaks in range [mz_start, mz_end]

Note that in a real-world application, we would set the mz_start and mz_end parameter to an actual area of interest, for example the area between 125 and 132 which contains quantitative ions for a TMT experiment.

Similarly we could only retain peaks above a certain intensity or keep only the top N peaks in each spectrum.

For more advanced filtering tasks pyOpenMS provides special algorithm classes. We will take a closer look at some of them in the algorithm section.