Charge and Isotope Deconvolution#

A single mass spectrum contains measurements of one or more analytes and the m/z values recorded for these analytes. Most analytes produce multiple signals in the mass spectrometer, due to the natural abundance of carbon \(13\) (naturally occurring at ca. \(1\%\) frequency) and the large amount of carbon atoms in most organic molecules, most analytes produce a so-called isotopic pattern with a monoisotopic peak (all carbon are \(\ce{^{12}C}\)) and a first isotopic peak (exactly one carbon atom is a \(\ce{^{13}C}\)), a second isotopic peak (exactly two atoms are \(\ce{^{13}C}\)) etc. Note that also other elements can contribute to the isotope pattern, see the chemistry section for further details.

In addition, each analyte may appear in more than one charge state and adduct state, a singly charge analyte \(\ce{[M +H]+}\) may be accompanied by a doubly charged analyte \(\ce{[M +2H]++}\) or a sodium adduct \(\ce{[M +Na]+}\). In the case of a multiply charged peptide, the isotopic traces are spaced by PROTON_MASS / charge_state which is often close to \(0.5\ m/z\) for doubly charged analytes, \(0.33\ m/z\) for triply charged analytes etc. Note: tryptic peptides often appear at least doubly charged, while small molecules often carry a single charge but can have adducts other than hydrogen.

Single Peak Example#

 1import pyopenms as oms
 3charge = 2
 4seq = oms.AASequence.fromString("DFPIANGER")
 5seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
 6isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
 7print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 9# Append isotopic distribution to spectrum
10s = oms.MSSpectrum()
11for iso in isotopes.getContainer():
12    iso.setMZ(iso.getMZ() / charge)
13    s.push_back(iso)
14    print("Isotope", iso.getMZ(), ":", iso.getIntensity())
16oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
18for p in s:
19    print(p.getMZ(), p.getIntensity())

Note that the algorithm presented here as some heuristics built into it, such as assuming that the isotopic peaks will decrease after the first isotopic peak. This heuristic can be tuned by changing the parameter use_decreasing_model and start_intensity_check. In this case, the second isotopic peak is the highest in intensity and the start_intensity_check parameter needs to be set to 3.

 1charge = 4
 3seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
 4isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
 5print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
 7# Append isotopic distribution to spectrum
 8s = oms.MSSpectrum()
 9for iso in isotopes.getContainer():
10    iso.setMZ(iso.getMZ() / charge)
11    s.push_back(iso)
12    print("Isotope", iso.getMZ(), ":", iso.getIntensity())
14min_charge = 1
15min_isotopes = 2
16max_isotopes = 10
17use_decreasing_model = True
18start_intensity_check = 3
20    s,
21    10,
22    True,
23    min_charge,
24    charge,
25    True,
26    min_isotopes,
27    max_isotopes,
28    True,
29    True,
30    True,
31    use_decreasing_model,
32    start_intensity_check,
33    False,
35for p in s:
36    print(p.getMZ(), p.getIntensity())

Full Spectral De-Isotoping#

In the following code segment, we will use a sample measurement of BSA (Bovine Serum Albumin), and apply a simple algorithm in OpenMS for “deisotoping” a mass spectrum, which means grouping peaks of the same isotopic pattern charge state:

 1from urllib.request import urlretrieve
 3gh = ""
 4urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")
 6e = oms.MSExperiment()
 7oms.MzMLFile().load("BSA1.mzML", e)
 8s = e[214]
11    s,
12    0.1,
13    False,
14    1,
15    3,
16    True,
17    min_isotopes,
18    max_isotopes,
19    True,
20    True,
21    True,
22    use_decreasing_model,
23    start_intensity_check,
24    False,
30e2 = oms.MSExperiment()
32oms.MzMLFile().store("BSA1_scan214_full.mzML", e2)
33e2 = oms.MSExperiment()
35oms.MzMLFile().store("BSA1_scan214_deisotoped.mzML", e2)
37maxvalue = max([p.getIntensity() for p in s])
38for p in s:
39    if p.getIntensity() > 0.25 * maxvalue:
40        print(p.getMZ(), p.getIntensity())

which produces the following output


974.4572680576728 6200571.5
974.4589691256419 3215808.75

As we can see, the algorithm has reduced \(140\) peaks to \(41\) deisotoped peaks. It also has identified a molecule at \(974.45\ m/z\) as the most intense peak in the data (base peak).


The reason we see two peaks very close together becomes apparent once we look at the data in TOPPView which indicates that the \(974.4572680576728\) peak is derived from a \(\ce{2+}\) peak at m/z \(487.73\) and the peak at \(974.4589691256419\) is derived from a \(\ce{3+}\) peak at m/z \(325.49\): the algorithm has identified a single analyte in two charge states and deconvoluted the peaks to their nominal mass of a \(\ce{[M +H]+}\) ion, which produces two peaks very close together (\(\ce{2+}\) and \(\ce{3+}\) peak):


Looking at the full mass spectrum and comparing it to the original mass spectrum, we can see the original (centroided) mass spectrum on the top and the deisotoped mass spectrum on the bottom in blue. Note how hovering over a peak in the deisotoped mass spectrum indicates the charge state:


In the next section (Feature Detection), we will look at 2-dimensional deisotoping where instead of a single mass spectrum, multiple mass spectra from a LC-MS experiment are analyzed together. There algorithms analyze the full 2-dimensional (m/z and RT) signal and are generally more powerful than the 1-dimensional algorithm discussed here. However, not all data is 2 dimensional and the algorithm discussed here has many application in practice (e.g. single mass spectra, fragment ion mass spectra in DDA etc.).