MS Data#

Spectrum#

The most important container for raw data and peaks is MSSpectrum which we have already worked with in the Getting Started tutorial. MSSpectrum is a container for 1-dimensional peak data (a container of Peak1D). You can access these objects directly, by using an iterator or indexing. Meta-data is accessible through inheritance of the SpectrumSettings objects which handles meta data of a mass spectrum.

In the following example program, a MSSpectrum is filled with peaks, sorted according to mass-to-charge ratio and a selection of peak positions is displayed.

First we create a mass spectrum and insert peaks with descending mass-to-charge ratios:

 1import pyopenms as oms
 2
 3spectrum = oms.MSSpectrum()
 4mz = range(500, 1500, 100)
 5i = [1 for mass in mz]
 6spectrum.set_peaks([mz, i])
 7
 8# Sort the peaks according to ascending mass-to-charge ratio
 9spectrum.sortByPosition()
10
11# Iterate over spectrum of those peaks
12for p in spectrum:
13    print(p.getMZ(), p.getIntensity())
14
15# Access a peak by index
16print("\nFirst peak: ", spectrum[0].getMZ(), spectrum[0].getIntensity())
500.0 1.0
600.0 1.0
700.0 1.0
800.0 1.0
900.0 1.0
1000.0 1.0
1100.0 1.0
1200.0 1.0
1300.0 1.0
1400.0 1.0

First peak: 500.0 1.0

Note how lines 11-12 (as well as line 19) use the direct access to the Peak1D objects (explicit iteration through the MSSpectrum object, which is convenient but slow since a new Peak1D object needs to be created each time). The following example uses the faster access through numpy arrays with get_peaks() or set_peaks(). Direct iteration is only shown for demonstration purposes and should not be used in production code.

1# More efficient peak access with get_peaks()
2for mz, i in zip(*spectrum.get_peaks()):
3    print(mz, i)
500.0 1.0
600.0 1.0
700.0 1.0
800.0 1.0
900.0 1.0
1000.0 1.0
1100.0 1.0
1200.0 1.0
1300.0 1.0
1400.0 1.0

To discover the full set of functionality of MSSpectrum, we use the Python help() function. In particular, we find several important sets of meta information attached to the mass spectrum including retention time, the MS level (MS1, MS2, …), precursor ion, ion mobility drift time and extra data arrays.

1help(oms.MSSpectrum)

We now set several of these properties in a current MSSpectrum:

 1# create spectrum and set properties
 2spectrum = oms.MSSpectrum()
 3spectrum.setDriftTime(25)  # 25 ms
 4spectrum.setRT(205.2)  # 205.2 s
 5spectrum.setMSLevel(3)  # MS3
 6
 7# Add peak(s) to spectrum
 8spectrum.set_peaks(([401.5], [900]))
 9
10# create precursor information
11p = oms.Precursor()
12p.setMZ(600)  # isolation at 600 +/- 1.5 Th
13p.setIsolationWindowLowerOffset(1.5)
14p.setIsolationWindowUpperOffset(1.5)
15p.setActivationEnergy(40)  # 40 eV
16p.setCharge(4)  # 4+ ion
17
18# and store precursor in spectrum
19spectrum.setPrecursors([p])
20
21# set additional instrument settings (e.g. scan polarity)
22IS = oms.InstrumentSettings()
23IS.setPolarity(oms.IonSource.Polarity.POSITIVE)
24spectrum.setInstrumentSettings(IS)
25
26# get and check scan polarity
27polarity = spectrum.getInstrumentSettings().getPolarity()
28if polarity == oms.IonSource.Polarity.POSITIVE:
29    print("scan polarity: positive")
30elif polarity == oms.IonSource.Polarity.NEGATIVE:
31    print("scan polarity: negative")
32
33# Optional: additional data arrays / peak annotations
34fda = oms.FloatDataArray()
35fda.setName("Signal to Noise Array")
36fda.push_back(15)
37sda = oms.StringDataArray()
38sda.setName("Peak annotation")
39sda.push_back("y15++")
40spectrum.setFloatDataArrays([fda])
41spectrum.setStringDataArrays([sda])
42
43# Add spectrum to MSExperiment
44exp = oms.MSExperiment()
45exp.addSpectrum(spectrum)
46
47# Add second spectrum to the MSExperiment
48spectrum2 = oms.MSSpectrum()
49spectrum2.set_peaks(([1, 2], [1, 2]))
50exp.addSpectrum(spectrum2)
51
52# store spectra in mzML file
53oms.MzMLFile().store("testfile.mzML", exp)
scan polarity: positive

We have created a single mass spectrum and set basic mass spectrum properties (drift time, retention time, MS level, precursor charge, isolation window and activation energy). Additional instrument settings allow to set e.g. the polarity of the Ion source). We next add actual peaks into the spectrum (a single peak at Lmath:401.5 m/z and \(900\ intensity\)). Additional metadata can be stored in data arrays for each peak (e.g. use cases care peak annotations or “Signal to Noise” values for each peak. Finally, we add the spectrum to an MSExperiment container to save it using the MzMLFile class in a file called testfile.mzML.

You can now open the resulting mass spectrum in a mass spectrum viewer. We use the OpenMS viewer TOPPView (which you will get when you install OpenMS from the official website) and look at our mass spectrum:

../_images/spectrum1.png

TOPPView displays our mass spectrum with its single peak at \(401.5\ m/z\) and it also correctly displays its retention time at \(205.2\ seconds\) and precursor isolation target of \(600.0/ m/z\). Notice how TOPPView displays the information about the S/N for the peak (S/N = 15) and its annotation as \(\ce{y15++}\) in the status bar below when the user clicks on the peak at \(401.5\ m/z\) as shown in the screenshot.

We can also visualize our mass spectrum from before using the plot_spectrum() function from the spectrum_utils visualization library:

1import matplotlib.pyplot as plt
2from pyopenms.plotting import plot_spectrum
3
4plot_spectrum(spectrum)
5plt.show()
../_images/SpectrumPlot.png

Chromatogram#

An additional container for raw data is the MSChromatogram container, which is highly analogous to the MSSpectrum container, but contains an array of ChromatogramPeak and is derived from ChromatogramSettings:

 1import numpy as np
 2
 3
 4def gaussian(x, mu, sig):
 5    return np.exp(-np.power(x - mu, 2.0) / (2 * np.power(sig, 2.0)))
 6
 7
 8# Create new chromatogram
 9chromatogram = oms.MSChromatogram()
10
11# Set raw data (RT and intensity)
12rt = range(1500, 500, -100)
13i = [gaussian(rtime, 1000, 150) for rtime in rt]
14chromatogram.set_peaks([rt, i])
15
16# Sort the peaks according to ascending retention time
17chromatogram.sortByPosition()
18
19print("Iterate over peaks with getRT() and getIntensity()")
20for p in chromatogram:
21    print(p.getRT(), p.getIntensity())
22
23print("\nIterate more efficiently over peaks with get_peaks()")
24for rt, i in zip(*chromatogram.get_peaks()):
25    print(rt, i)
26
27print("\nAccess an individual peak by index")
28print(chromatogram[2].getRT(), chromatogram[2].getIntensity())
29
30# Add meta information to the chromatogram
31chromatogram.setNativeID("Trace XIC@405.2")
32
33# Store a precursor ion for the chromatogram
34p = oms.Precursor()
35p.setIsolationWindowLowerOffset(1.5)
36p.setIsolationWindowUpperOffset(1.5)
37p.setMZ(405.2)  # isolation at 405.2 +/- 1.5 Th
38p.setActivationEnergy(40)  # 40 eV
39p.setCharge(2)  # 2+ ion
40p.setMetaValue("description", chromatogram.getNativeID())
41p.setMetaValue("peptide_sequence", chromatogram.getNativeID())
42chromatogram.setPrecursor(p)
43
44# Also store a product ion for the chromatogram (e.g. for SRM)
45p = oms.Product()
46p.setMZ(603.4)  # transition from 405.2 -> 603.4
47chromatogram.setProduct(p)
48
49# Store as mzML
50exp = oms.MSExperiment()
51exp.addChromatogram(chromatogram)
52oms.MzMLFile().store("testfile3.mzML", exp)
53
54# Visualize the resulting data using matplotlib
55import matplotlib.pyplot as plt
56
57for chrom in exp.getChromatograms():
58    retention_times, intensities = chrom.get_peaks()
59    plt.plot(retention_times, intensities, label=chrom.getNativeID())
60
61plt.xlabel("time (s)")
62plt.ylabel("intensity (cps)")
63plt.legend()
64plt.show()
Iterate over peaks with getRT() and getIntensity()
600.0 0.028565499931573868
700.0 0.1353352814912796
800.0 0.4111122786998749
900.0 0.8007373809814453
1000.0 1.0
1100.0 0.8007373809814453
1200.0 0.4111122786998749
1300.0 0.1353352814912796
1400.0 0.028565499931573868
1500.0 0.003865920240059495

Iterate more efficiently over peaks with get_peaks()
600.0 0.0285655
700.0 0.13533528
800.0 0.41111228
900.0 0.8007374
1000.0 1.0
1100.0 0.8007374
1200.0 0.41111228
1300.0 0.13533528
1400.0 0.0285655
1500.0 0.0038659202

Access an individual peak by index
800.0 0.4111122786998749

This shows how the MSExperiment class can hold mass spectra as well as chromatograms .

Again we can visualize the resulting data using TOPPView using its chromatographic viewer capability, which shows the peak over retention time:

../_images/chromatogram1.png

Note how the annotation using precursor and production mass of our XIC chromatogram is displayed in the viewer.

We can also visualize the resulting data using matplotlib. Here we can plot every chromatogram in our MSExperiment and label it with it’s native ID.

../_images/ChromPlot.png

LC-MS/MS Experiment#

In OpenMS, LC-MS/MS injections are represented as so-called peak maps (using the MSExperiment class), which we have already encountered above. The MSExperiment class can hold a list of MSSpectrum object (as well as a list of MSChromatogram objects, see below). The MSExperiment object holds such peak maps as well as meta-data about the injection. Access to individual mass spectra is performed through getSpectrum() and getChromatogram().

In the following code, we create an MSExperiment and populate it with several mass spectra:

 1# The following examples creates an MSExperiment which holds six
 2# MSSpectrum instances.
 3exp = oms.MSExperiment()
 4for i in range(6):
 5    spectrum = oms.MSSpectrum()
 6    spectrum.setRT(i)
 7    spectrum.setMSLevel(1)
 8    for mz in range(500, 900, 100):
 9        peak = oms.Peak1D()
10        peak.setMZ(mz + i)
11        peak.setIntensity(100 - 25 * abs(i - 2.5))
12        spectrum.push_back(peak)
13    exp.addSpectrum(spectrum)
14
15# Iterate over spectra
16for i_spectrum, spectrum in enumerate(exp, start=1):
17    print("Spectrum {i:d}:".format(i=i_spectrum))
18    for peak in spectrum:
19        print(spectrum.getRT(), peak.getMZ(), peak.getIntensity())
Spectrum 1:
0.0 500.0 37.5
0.0 600.0 37.5
0.0 700.0 37.5
0.0 800.0 37.5
Spectrum 2:
1.0 501.0 62.5
1.0 601.0 62.5
1.0 701.0 62.5
1.0 801.0 62.5
Spectrum 3:
2.0 502.0 87.5
2.0 602.0 87.5
2.0 702.0 87.5
2.0 802.0 87.5
Spectrum 4:
3.0 503.0 87.5
3.0 603.0 87.5
3.0 703.0 87.5
3.0 803.0 87.5
Spectrum 5:
4.0 504.0 62.5
4.0 604.0 62.5
4.0 704.0 62.5
4.0 804.0 62.5
Spectrum 6:
5.0 505.0 37.5
5.0 605.0 37.5
5.0 705.0 37.5
5.0 805.0 37.5

In the above code, we create six instances of MSSpectrum (line 4), populate it with three peaks at \(500\), \(900\) and \(100\) m/z and append them to the MSExperiment object (line 13). We can easily iterate over the mass spectra in the whole experiment by using the intuitive iteration on lines 16-19 or we can use list comprehensions to sum up intensities of all mass spectra that fulfill certain conditions:

 1# Sum intensity of all spectra between RT 2.0 and 3.0
 2print(
 3    sum(
 4        [
 5            p.getIntensity()
 6            for s in exp
 7            if s.getRT() >= 2.0 and s.getRT() <= 3.0
 8            for p in s
 9        ]
10    )
11)
700.0

We could store the resulting experiment containing the six mass spectra as mzML using the MzMLFile object:

1# Store as mzML
2oms.MzMLFile().store("testfile2.mzML", exp)

Again we can visualize the resulting data using TOPPView using its 3D viewer capability, which shows the six scans over retention time where the traces first increase and then decrease in intensity:

../_images/spectrum2.png

Alternatively we can visualize our data directly with Python. For smaller data sets we can use matplotlib to generate a 2D scatter plot with the peak intensities represented by a colorbar. With this plot we can zoom in and inspect our data in more detail.

The following example figures were generated using a mzML file provided by OpenMS.

 1import numpy as np
 2import matplotlib.pyplot as plt
 3import matplotlib.colors as colors
 4
 5
 6def plot_spectra_2D(exp, ms_level=1, marker_size=5):
 7    exp.updateRanges()
 8    for spec in exp:
 9        if spec.getMSLevel() == ms_level:
10            mz, intensity = spec.get_peaks()
11            p = intensity.argsort()  # sort by intensity to plot highest on top
12            rt = np.full([mz.shape[0]], spec.getRT(), float)
13            plt.scatter(
14                rt,
15                mz[p],
16                c=intensity[p],
17                cmap="afmhot_r",
18                s=marker_size,
19                norm=colors.LogNorm(
20                    exp.getMinIntensity() + 1, exp.getMaxIntensity()
21                ),
22            )
23    plt.clim(exp.getMinIntensity() + 1, exp.getMaxIntensity())
24    plt.xlabel("time (s)")
25    plt.ylabel("m/z")
26    plt.colorbar()
27    plt.show()  # slow for larger data sets
28
29
30from urllib.request import urlretrieve
31
32gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
33urlretrieve(gh + "/src/data/FeatureFinderMetaboIdent_1_input.mzML", "test.mzML")
34
35exp = oms.MSExperiment()
36oms.MzMLFile().load("test.mzML", exp)
37
38plot_spectra_2D(exp)
../_images/Spectra2D.png ../_images/Spectra2DDetails.png

For larger data sets this will be too slow since every individual peak gets displayed. However, we can use BilinearInterpolation which produces an overview image of our mass spectra. This can be useful for a brief visual inspection of your sample in quality control.

 1import numpy as np
 2import matplotlib.pyplot as plt
 3
 4
 5def plot_spectra_2D_overview(experiment):
 6    rows = 200.0
 7    cols = 200.0
 8    exp.updateRanges()
 9
10    bilip = oms.BilinearInterpolation()
11    tmp = bilip.getData()
12    tmp.resize(int(rows), int(cols), float())
13    bilip.setData(tmp)
14    bilip.setMapping_0(0.0, exp.getMinRT(), rows - 1, exp.getMaxRT())
15    bilip.setMapping_1(0.0, exp.getMinMZ(), cols - 1, exp.getMaxMZ())
16    for spec in exp:
17        if spec.getMSLevel() == 1:
18            mzs, ints = spec.get_peaks()
19            rt = spec.getRT()
20            for i in range(0, len(mzs)):
21                bilip.addValue(rt, mzs[i], ints[i])
22
23    data = np.ndarray(shape=(int(cols), int(rows)), dtype=np.float64)
24    for i in range(int(rows)):
25        for j in range(int(cols)):
26            data[i][j] = bilip.getData().getValue(i, j)
27
28    plt.imshow(np.rot90(data), cmap="gist_heat_r")
29    plt.xlabel("retention time (s)")
30    plt.ylabel("m/z")
31    plt.xticks(
32        np.linspace(0, int(rows), 20, dtype=int),
33        np.linspace(exp.getMinRT(), exp.getMaxRT(), 20, dtype=int),
34    )
35    plt.yticks(
36        np.linspace(0, int(cols), 20, dtype=int),
37        np.linspace(exp.getMinMZ(), exp.getMaxMZ(), 20, dtype=int)[::-1],
38    )
39    plt.show()
40
41
42plot_spectra_2D_overview(exp)
../_images/Spectra2DOverview.png

Example: Precursor Purity#

When an MS2 spectrum is generated, the precursor from the MS1 spectrum is gathered, fragmented and measured. In practice, the instrument gathers the ions in a user-defined window around the precursor m/z - the so-called precursor isolation window.

../_images/precursor_isolation_window.png

In some cases, the precursor isolation window contains contaminant peaks from other analytes. Depending on the analysis requirements, this can lead to issues in quantification for example, for isobaric experiments.

Here, we can assess the purity of the precursor to filter spectra with a score below our expectation.

 1from urllib.request import urlretrieve
 2
 3gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
 4urlretrieve(
 5    gh + "/src/data/PrecursorPurity_input.mzML", "PrecursorPurity_input.mzML"
 6)
 7
 8exp = oms.MSExperiment()
 9oms.MzMLFile().load("PrecursorPurity_input.mzML", exp)
10
11# for this example, we check which are MS2 spectra and choose one of them
12for i, element in enumerate(exp):
13    print(str(i) + ": MS" + str(element.getMSLevel()))
14
15# get the precursor information from the MS2 spectrum at index 3
16ms2_precursor = exp[3].getPrecursors()[0]
17
18# get the previous recorded MS1 spectrum
19isMS1 = False
20i = 3  # start at the index of the MS2 spectrum
21while isMS1 == False:
22    if exp[i].getMSLevel() == 1:
23        isMS1 = True
24    else:
25        i -= 1
26
27ms1_spectrum = exp[i]
28
29# calculate the precursor purity in a 10 ppm precursor isolation window
30purity_score = oms.PrecursorPurity().computePrecursorPurity(
31    ms1_spectrum, ms2_precursor, 10, True
32)
33
34print("\nPurity scores")
35print("total:", purity_score.total_intensity)  # 9098343.890625
36print("target:", purity_score.target_intensity)  # 7057944.0
37print(
38    "signal proportion:", purity_score.signal_proportion
39)  # 0.7757394186070014
40print("target peak count:", purity_score.target_peak_count)  # 1
41print("residual peak count:", purity_score.residual_peak_count)  # 4
0: MS1
1: MS2
2: MS2
3: MS2
4: MS2
5: MS2
6: MS1

Purity scores
total: 9098343.890625
target: 7057944.0
signal proportion: 0.7757394186070014
target peak count: 1
residual peak count: 4

We could assess that we have four other non-isotopic peaks apart from our precursor and its isotope peaks within our precursor isolation window. The signal of the isotopic peaks correspond to roughly 78% of all intensities in the precursor isolation window.

Example: Filtering Mass Spectra#

Here we will look at some code snippets that might come in handy when dealing with mass spectra data.

But first, we will load some test data:

1gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
2urlretrieve(gh + "/src/data/tiny.mzML", "test.mzML")
3
4inp = oms.MSExperiment()
5oms.MzMLFile().load("test.mzML", inp)

Filtering Mass Spectra by MS Level#

We will filter the data from test.mzML file by only retaining mass spectra that are not MS1 spectra (e.g. MS2, MS3 or MSn spectra):

1filtered = oms.MSExperiment()
2for s in inp:
3    if s.getMSLevel() > 1:
4        filtered.addSpectrum(s)
5
6# filtered now only contains spectra with MS level > 2

Filtering by Scan Number#

We could also use a list of scan numbers as filter criterion to only retain a list of MS scans we are interested in:

1scan_nrs = [0, 2, 5, 7]
2
3filtered = oms.MSExperiment()
4for k, s in enumerate(inp):
5    if k in scan_nrs:
6        filtered.addSpectrum(s)

Filtering Mass Spectra and Peaks#

Suppose we are interested in only in a small m/z window of our fragment ion mass spectra. We can easily filter our data accordingly:

 1mz_start = 6.0
 2mz_end = 12.0
 3filtered = oms.MSExperiment()
 4for s in inp:
 5    if s.getMSLevel() > 1:
 6        filtered_mz = []
 7        filtered_int = []
 8        for mz, i in zip(*s.get_peaks()):
 9            if mz > mz_start and mz < mz_end:
10                filtered_mz.append(mz)
11                filtered_int.append(i)
12        s.set_peaks((filtered_mz, filtered_int))
13        filtered.addSpectrum(s)
14
15# filtered only contains only fragment spectra with peaks in range [mz_start, mz_end]

Note that in a real-world application, we would set the mz_start and mz_end parameter to an actual area of interest, for example the area between 125 and 132 which contains quantitative ions for a TMT experiment.

Similarly we could only retain peaks above a certain intensity or keep only the top N peaks in each mass spectrum.

For more advanced filtering tasks pyOpenMS provides special algorithm classes. We will take a closer look at some of them in the next section.

Filtering Mass Spectra with TOPP Tools#

We can also use predefined TOPP tools to filter our data. First we need to load in the data:

 1import matplotlib.pyplot as plt
 2from pyopenms.plotting import plot_spectrum, mirror_plot_spectrum
 3
 4gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
 5urlretrieve(
 6    gh + "/src/data/YIC(Carbamidomethyl)DNQDTISSK.mzML", "observed.mzML"
 7)
 8
 9exp = oms.MSExperiment()
10# Load mzML file and obtain spectrum for peptide YIC(Carbamidomethyl)DNQDTISSK
11oms.MzMLFile().load("observed.mzML", exp)
12
13# Get first spectrum
14spectra = exp.getSpectra()
15observed_spectrum = spectra[0]

The WindowMower tool can be used to remove peaks in a sliding or jumping window. The window size, number of highest peaks to keep and move type can be set with a Param object

 1from copy import deepcopy
 2
 3window_mower_filter = oms.WindowMower()
 4
 5# Copy the original spectrum
 6mowed_spectrum = deepcopy(observed_spectrum)
 7
 8# Set parameters
 9params = oms.Param()
10# Defines the m/z range of the sliding window
11params.setValue("windowsize", 100.0, "")
12# Defines the number of highest peaks to keep in the sliding window
13params.setValue("peakcount", 1, "")
14# Defines the type of window movement: jump (window size steps) or slide (one peak steps)
15params.setValue("movetype", "jump", "")
16
17# Apply window mowing
18window_mower_filter.setParameters(params)
19window_mower_filter.filterPeakSpectrum(mowed_spectrum)
20
21# Visualize the resulting data together with the original spectrum
22mirror_plot_spectrum(observed_spectrum, mowed_spectrum)
23plt.show()
../_images/window_mower.png

Noise can be easily removed with ThresholdMower by setting a threshold value for the intensity of peaks and cutting off everything below.

 1# Copy spectrum
 2threshold_mower_spectrum = deepcopy(observed_spectrum)
 3
 4threshold_mower_filter = oms.ThresholdMower()
 5
 6# Set parameters
 7params = oms.Param()
 8params.setValue("threshold", 20.0, "")
 9
10# Apply threshold mowing
11threshold_mower_filter.setParameters(params)
12threshold_mower_filter.filterPeakSpectrum(threshold_mower_spectrum)
13
14mirror_plot_spectrum(observed_spectrum, threshold_mower_spectrum)
15plt.show()
../_images/threshold_mower.png

We can also use e.g. NLargest to keep only the N highest peaks in a spectrum.

 1# Copy spectrum
 2nlargest_spectrum = deepcopy(observed_spectrum)
 3
 4nlargest_filter = oms.NLargest()
 5
 6# Set parameters
 7params = oms.Param()
 8params.setValue("n", 4, "")
 9
10# Apply N-Largest filter
11nlargest_filter.setParameters(params)
12nlargest_filter.filterPeakSpectrum(nlargest_spectrum)
13
14mirror_plot_spectrum(observed_spectrum, nlargest_spectrum)
15plt.show()
16# Two peaks are overlapping, so only three peaks are really visible in the plot
../_images/nlargest.png