The most important container for raw data and peaks is
MSSpectrum which we
have already worked with in the Getting Started
MSSpectrum is a container for 1-dimensional peak data (a
Peak1D). You can access these objects directly, however it is
faster to use the
set_peaks functions which use Python
numpy arrays for raw data access. Meta-data is accessible through inheritance
SpectrumSettings objects which handles meta data of a spectrum.
In the following example program, a MSSpectrum is filled with peaks, sorted according to mass-to-charge ratio and a selection of peak positions is displayed.
First we create a spectrum and insert peaks with descending mass-to-charge ratios:
1from pyopenms import * 2spectrum = MSSpectrum() 3mz = range(1500, 500, -100) 4i = [0 for mass in mz] 5spectrum.set_peaks([mz, i]) 6 7# Sort the peaks according to ascending mass-to-charge ratio 8spectrum.sortByPosition() 9 10# Iterate over spectrum of those peaks 11for p in spectrum: 12 print(p.getMZ(), p.getIntensity()) 13 14# More efficient peak access with get_peaks() 15for mz, i in zip(*spectrum.get_peaks()): 16 print(mz, i) 17 18# Access a peak by index 19print(spectrum.getMZ(), spectrum.getIntensity())
Note how lines 11-12 (as well as line 19) use the direct access to the
Peak1D objects (explicit iteration through the
MSSpectrum object, which
is convenient but slow since a new
Peak1D object needs to be created each
time) while lines 15-16 use the faster access through numpy arrays. Direct
iteration is only shown for demonstration purposes and should not be used in
To discover the full set of functionality of
MSSpectrum, we use the
help() function. In particular, we find several important sets of meta
information attached to the spectrum including retention time, the ms level
(MS1, MS2, …), precursor ion, ion mobility drift time and extra data arrays.
We now set several of these properties in a current MSSpectrum:
1# create spectrum and set properties 2spectrum = MSSpectrum() 3spectrum.setDriftTime(25) # 25 ms 4spectrum.setRT(205.2) # 205.2 s 5spectrum.setMSLevel(3) # MS3 6 7# Add peak(s) to spectrum 8spectrum.set_peaks( ([401.5], ) ) 9 10# create precursor information 11p = Precursor() 12p.setMZ(600) # isolation at 600 +/- 1.5 Th 13p.setIsolationWindowLowerOffset(1.5) 14p.setIsolationWindowUpperOffset(1.5) 15p.setActivationEnergy(40) # 40 eV 16p.setCharge(4) # 4+ ion 17 18# and store precursor in spectrum 19spectrum.setPrecursors( [p] ) 20 21# set additional instrument settings (e.g. scan polarity) 22IS = InstrumentSettings() 23IS.setPolarity(IonSource.Polarity.POSITIVE) 24spectrum.setInstrumentSettings(IS) 25 26# get and check scan polarity 27polarity = spectrum.getInstrumentSettings().getPolarity() 28if (polarity == IonSource.Polarity.POSITIVE): 29 print("scan polarity: positive") 30elif (polarity == IonSource.Polarity.NEGATIVE): 31 print("scan polarity: negative") 32 33# Optional: additional data arrays / peak annotations 34fda = FloatDataArray() 35fda.setName("Signal to Noise Array") 36fda.push_back(15) 37sda = StringDataArray() 38sda.setName("Peak annotation") 39sda.push_back("y15++") 40spectrum.setFloatDataArrays( [fda] ) 41spectrum.setStringDataArrays( [sda] ) 42 43# Add spectrum to MSExperiment 44exp = MSExperiment() 45exp.addSpectrum(spectrum) 46 47# Add second spectrum to the MSExperiment 48spectrum2 = MSSpectrum() 49spectrum2.set_peaks( ([1, 2], [1, 2]) ) 50exp.addSpectrum(spectrum2) 51 52# store spectra in mzML file 53MzMLFile().store("testfile.mzML", exp)
We have created a single spectrum and set basic spectrum properties (drift
time, retention time, MS level, precursor charge, isolation window and
activation energy). Additional instrument settings allow to set e.g. the polarity of the Ion source).
We next add actual peaks into the spectrum (a single peak at 401.5 m/z and 900 intensity).
Additional metadata can be stored in data arrays for each peak
(e.g. use cases care peak annotations or “Signal to Noise” values for each
peak. Finally, we add the spectrum to an
MSExperiment container to save it using the
MzMLFile class in a file called “testfile.mzML”.
You can now open the resulting spectrum in a spectrum viewer. We use the OpenMS
TOPPView (which you will get when you install OpenMS from the
official website) and look at our spectrum:
TOPPView displays our spectrum with its single peak at 401.5 m/z and it
also correctly displays its retention time at 205.2 seconds and precursor
isolation target of 600.0 m/z. Notice how TOPPView displays the information
about the S/N for the peak (S/N = 15) and its annotation as
y15++ in the status
bar below when the user clicks on the peak at 401.5 m/z as shown in the
We can also visualize our spectrum with
matplotlib using the following function:
import matplotlib.pyplot as plt def plot_spectrum(spectrum): # plot every peak in spectrum and annotate with it's m/z for mz, i in zip(*spectrum.get_peaks()): plt.plot([mz, mz], [0, i], color = 'black') plt.text(mz, i, str(mz)) # for the title add RT and Precursor m/z if available title = '' if spectrum.getRT() >= 0: title += 'RT: ' + str(spectrum.getRT()) if len(spectrum.getPrecursors()) >= 1: title += ' Precursor m/z: ' + str(spectrum.getPrecursors().getMZ()) plt.title(title) plt.ylabel('intensity') plt.xlabel('m/z') plt.ylim(bottom=0) plt.show() # plotting out spectrum that was defined earlier plot_spectrum(spectrum)
An additional container for raw data is the
MSChromatogram container, which
is highly analogous to the
MSSpectrum container, but contains an array of
ChromatogramPeak and is derived from
1import numpy as np 2 3def gaussian(x, mu, sig): 4 return np.exp(-np.power(x - mu, 2.) / (2 * np.power(sig, 2.))) 5 6# Create new chromatogram 7chromatogram = MSChromatogram() 8 9# Set raw data (RT and intensity) 10rt = range(1500, 500, -100) 11i = [gaussian(rtime, 1000, 150) for rtime in rt] 12chromatogram.set_peaks([rt, i]) 13 14# Sort the peaks according to ascending retention time 15chromatogram.sortByPosition() 16 17# Iterate over chromatogram of those peaks 18for p in chromatogram: 19 print(p.getRT(), p.getIntensity()) 20 21# More efficient peak access with get_peaks() 22for rt, i in zip(*chromatogram.get_peaks()): 23 print(rt, i) 24 25# Access a peak by index 26print(chromatogram.getRT(), chromatogram.getIntensity()) 27 28# Add meta information to the chromatogram 29chromatogram.setNativeID("Trace XIC@405.2") 30 31# Store a precursor ion for the chromatogram 32p = Precursor() 33p.setIsolationWindowLowerOffset(1.5) 34p.setIsolationWindowUpperOffset(1.5) 35p.setMZ(405.2) # isolation at 405.2 +/- 1.5 Th 36p.setActivationEnergy(40) # 40 eV 37p.setCharge(2) # 2+ ion 38p.setMetaValue("description", chromatogram.getNativeID()) 39p.setMetaValue("peptide_sequence", chromatogram.getNativeID()) 40chromatogram.setPrecursor(p) 41 42# Also store a product ion for the chromatogram (e.g. for SRM) 43p = Product() 44p.setMZ(603.4) # transition from 405.2 -> 603.4 45chromatogram.setProduct(p) 46 47# Store as mzML 48exp = MSExperiment() 49exp.addChromatogram(chromatogram) 50MzMLFile().store("testfile3.mzML", exp) 51 52# Visualize the resulting data using matplotlib 53import matplotlib.pyplot as plt 54 55for chrom in exp.getChromatograms(): 56 retention_times, intensities = chrom.get_peaks() 57 plt.plot(retention_times, intensities, label = chrom.getNativeID()) 58 59plt.xlabel('time (s)') 60plt.ylabel('intensity (cps)') 61plt.legend() 62plt.show()
This shows how the
MSExperiment class can hold spectra as well as chromatograms.
Again we can visualize the resulting data using
TOPPView using its chromatographic viewer
capability, which shows the peak over retention time:
Note how the annotation using precursor and production mass of our XIC chromatogram is displayed in the viewer.
We can also visualize the resulting data using
matplotlib. Here we can plot every
chromatogram in our
MSExperiment and label it with it’s native ID.
In OpenMS, LC-MS/MS injections are represented as so-called peak maps (using
MSExperiment class), which we have already encountered above. The
MSExperiment class can hold a list of
MSSpectrum object (as well as a
MSChromatogram objects, see below). The
holds such peak maps as well as meta-data about the injection. Access to
individual spectra is performed through
In the following code, we create an
MSExperiment and populate it with
1# The following examples creates an MSExperiment which holds six 2# MSSpectrum instances. 3exp = MSExperiment() 4for i in range(6): 5 spectrum = MSSpectrum() 6 spectrum.setRT(i) 7 spectrum.setMSLevel(1) 8 for mz in range(500, 900, 100): 9 peak = Peak1D() 10 peak.setMZ(mz + i) 11 peak.setIntensity(100 - 25*abs(i-2.5) ) 12 spectrum.push_back(peak) 13 exp.addSpectrum(spectrum) 14 15# Iterate over spectra 16for spectrum in exp: 17 for peak in spectrum: 18 print (spectrum.getRT(), peak.getMZ(), peak.getIntensity())
In the above code, we create six instances of
MSSpectrum (line 4), populate
it with three peaks at 500, 900 and 100 m/z and append them to the
MSExperiment object (line 13). We can easily iterate over the spectra in
the whole experiment by using the intuitive iteration on lines 16-18 or we can
use list comprehensions to sum up intensities of all spectra that fulfill
# Sum intensity of all spectra between RT 2.0 and 3.0 print(sum([p.getIntensity() for s in exp if s.getRT() >= 2.0 and s.getRT() <= 3.0 for p in s]))
700.0 87.5 * 8 700.0
We could store the resulting experiment containing the six spectra as mzML
# Store as mzML MzMLFile().store("testfile2.mzML", exp)
Again we can visualize the resulting data using
TOPPView using its 3D
viewer capability, which shows the six scans over retention time where the
traces first increase and then decrease in intensity:
Alternatively we can visualize our data directly with Python. For smaller data sets
we can use
matplotlib to generate a 2D scatter plot with the peak intensities
represented by a colorbar. With this plot we can zoom in and inspect our data in more detail.
The following example figures were generated using a mzML file provided by OpenMS.
import numpy as np import matplotlib.pyplot as plt import matplotlib.colors as colors def plot_spectra_2D(exp, ms_level=1, marker_size = 5): exp.updateRanges() print('collecting peak data...') for spec in exp: if spec.getMSLevel() == ms_level: mz, intensity = spec.get_peaks() p = intensity.argsort() # sort by intensity to plot highest on top rt = np.full([mz.shape], spec.getRT(), float) plt.scatter(rt, mz[p], c = intensity[p], cmap = 'afmhot_r', s=marker_size, norm=colors.LogNorm(exp.getMinIntensity()+1, exp.getMaxIntensity())) plt.clim(exp.getMinIntensity()+1, exp.getMaxIntensity()) plt.xlabel('time (s)') plt.ylabel('m/z') plt.colorbar() print('showing plot...') plt.show() # slow for larger data sets from urllib.request import urlretrieve gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve (gh + "/src/data/FeatureFinderMetaboIdent_1_input.mzML", "test.mzML") exp = MSExperiment() MzMLFile().load('test.mzML', exp) plot_spectra_2D(exp)
For larger data sets this will be too slow since every individual peak gets displayed.
However, we can use
BilinearInterpolation which produces an overview image of our spectra.
This can be useful for a brief visual inspection of your sample in quality control.
import numpy as np import matplotlib.pyplot as plt def plot_spectra_2D_overview(experiment): rows = 200.0 cols = 200.0 exp.updateRanges() bilip = BilinearInterpolation() tmp = bilip.getData() tmp.resize(int(rows), int(cols), float()) bilip.setData(tmp) bilip.setMapping_0(0.0, exp.getMinRT(), rows-1, exp.getMaxRT()) bilip.setMapping_1(0.0, exp.getMinMZ(), cols-1, exp.getMaxMZ()) print('collecting peak data...') for spec in exp: if spec.getMSLevel() == 1: mzs, ints = spec.get_peaks() rt = spec.getRT() for i in range(0, len(mzs)): bilip.addValue(rt, mzs[i], ints[i]) data = np.ndarray(shape=(int(cols), int(rows)), dtype=np.float64) for i in range(int(rows)): for j in range(int(cols)): data[i][j] = bilip.getData().getValue(i,j) plt.imshow(np.rot90(data), cmap='gist_heat_r') plt.xlabel('retention time (s)') plt.ylabel('m/z') plt.xticks(np.linspace(0,int(rows),20, dtype=int), np.linspace(exp.getMinRT(),exp.getMaxRT(),20, dtype=int)) plt.yticks(np.linspace(0,int(cols),20, dtype=int), np.linspace(exp.getMinMZ(),exp.getMaxMZ(),20, dtype=int)[::-1]) print('showing plot...') plt.show() plot_spectra_2D_overview(exp)
Example: Precursor Purity
When an MS/MS spectrum is generated, the precursor from the MS1 spectrum is gathered, fragmented and measured. In practice, the instrument gathers the ions in a user-defined window around the precursor m/z - the so-called precursor isolation window.
In some cases, the precursor isolation window contains contaminant peaks from other analytes. Depending on the analysis requirements, this can lead to issues in quantification for example, for isobaric experiments.
Here, we can assess the purity of the precursor to filter spectra with a score below our expectation.
from urllib.request import urlretrieve gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve (gh + "/src/data/PrecursorPurity_input.mzML", "PrecursorPurity_input.mzML") exp = MSExperiment() MzMLFile().load("PrecursorPurity_input.mzML", exp) # for this example, we check which are MS2 spectra and choose one of them for element in exp: print(element.getMSLevel()) # get the precursor information from the MS2 spectrum at index 3 ms2_precursor = Precursor() ms2_precursor = exp.getPrecursors(); # get the previous recorded MS1 spectrum isMS1 = False; i = 3 # start at the index of the MS2 spectrum while isMS1 == False: if exp[i].getMSLevel() == 1: isMS1 = True else: i -= 1 ms1_spectrum = exp[i] # calculate the precursor purity in a 10 ppm precursor isolation window purity_score = PrecursorPurity().computePrecursorPurity(ms1_spectrum, ms2_precursor, 10, True) print(purity_score.total_intensity) # 9098343.890625 print(purity_score.target_intensity) # 7057944.0 print(purity_score.signal_proportion) # 0.7757394186070014 print(purity_score.target_peak_count) # 1 print(purity_score.residual_peak_count) # 4
We could assess that we have four other non-isotopic peaks apart from our precursor and its isotope peaks within our precursor isolation window. The signal of the isotopic peaks correspond to roughly 78% of all intensities in the precursor isolation window.
Example: Filtering Spectra
Here we will look at some code snippets that might come in handy when dealing with spectra data.
But first, we will load some test data:
gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve (gh + "/src/data/tiny.mzML", "test.mzML") inp = MSExperiment() MzMLFile().load("test.mzML", inp)
Filtering Spectra by MS level
We will filter the data from “test.mzML” file by only retaining only spectra that are not MS1 spectra (e.g.MS2, MS3 or MSn spectra):
1filtered = MSExperiment() 2for s in inp: 3 if s.getMSLevel() > 1: 4 filtered.addSpectrum(s) 5 6# filtered now only contains spectra with MS level > 2
Filtering by scan number
We could also use a list of scan numbers as filter criterium to only retain a list of MS scans we are interested in:
1scan_nrs = [0, 2, 5, 7] 2 3filtered = MSExperiment() 4for k, s in enumerate(inp): 5 if k in scan_nrs: 6 filtered.addSpectrum(s)
Filtering Spectra and Peaks
Suppose we are interested in only in a small m/z window of our fragment ion spectra. We can easily filter our data accordingly:
1mz_start = 6.0 2mz_end = 12.0 3filtered = MSExperiment() 4for s in inp: 5 if s.getMSLevel() > 1: 6 filtered_mz =  7 filtered_int =  8 for mz, i in zip(*s.get_peaks()): 9 if mz > mz_start and mz < mz_end: 10 filtered_mz.append(mz) 11 filtered_int.append(i) 12 s.set_peaks((filtered_mz, filtered_int)) 13 filtered.addSpectrum(s) 14 15 # filtered only contains only fragment spectra with peaks in range [mz_start, mz_end]
Note that in a real-world application, we would set the
mz_end parameter to an actual area of interest, for example the area
between 125 and 132 which contains quantitative ions for a TMT experiment.
Similarly we could only retain peaks above a certain intensity or keep only the top N peaks in each spectrum.
For more advanced filtering tasks pyOpenMS provides special algorithm classes. We will take a closer look at some of them in the algorithm section.