Charge and Isotope Deconvolution#
A single mass spectrum contains measurements of one or more analytes and the m/z values recorded for these analytes. Most analytes produce multiple signals in the mass spectrometer, due to the natural abundance of heavy isotopes. The most dominant isotope in proteins is carbon \(13\) (naturally occurring at ca. \(1.1\%\) frequency). Other elements such as Hydrogen also have heavy isotopes, but they contribute to a much lesser extend, since the heavy isotopes are very low abundant, e.g. hydrogen \(2\) (Deuterium), occurs at a frequency of only \(0.0156\%\).
All analytes produce a so-called isotopic pattern, consisting of a monoisotopic peak and a first isotopic peak (exactly one extra neutron in one of the atoms of the molecule), a second isotopic peak (exactly two extra neutrons) etc. With higher mass, the monoisotopic peak will become dimishingly small, up to the point where it is not detectable any more (although this is rarely the case with peptides and more problematic for whole proteins in Top-Down approaches).
By definition, the monoisotopic peak is the peak which contains only isotopes of the most abundant type. For peptides and proteins, the constituent atoms are C,H,N,O,P,S, where incidentally, the most abundant isotope is also the lightest isotope. Hence, for peptides and proteins, the monoisotopic peak is always the lightest peak in an isotopic distribution.
See the chemistry section for further details on isotope abundances and how to compute isotope patterns.
In addition, each analyte may appear in more than one charge state and adduct
state, a singly charged analyte \(\ce{[M +H]+}\) may be accompanied by a doubly
charged analyte \(\ce{[M +2H]++}\) or a sodium adduct \(\ce{[M +Na]+}\). In the case of a
multiply charged peptide, the isotopic traces are spaced by NEUTRON_MASS /
charge_state which is often close to \(0.5\ m/z\) for doubly charged analytes,
\(0.33\ m/z\) for triply charged analytes etc. Note: tryptic peptides often appear
either singly charged (when ionized with MALDI), or doubly charged (when ionized with ESI).
Higher charges are also possible, but usually connected to incomplete tryptic digestions with missed cleavages.
Small molecules in metabolomics often carry a single charge but can have adducts other than hydrogen.
Single Peak Example#
Let’s compute the isotope distribution of the peptide DFPIANGER using the classes AASequence and
EmpiricalFormula. Then we use the Deisotoper to find the monoisotopic peak:
1import pyopenms as oms
2
3seq = oms.AASequence.fromString("DFPIANGER")
4print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
5
6## get isotopic distribution for two additional hydrogens (which carry the charge)
7charge = 2
8seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
9isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))
10
11# Append isotopic distribution to spectrum
12s = oms.MSSpectrum()
13for iso in isotopes.getContainer(): # the container contains masses, not m/z!
14 iso.setMZ(iso.getMZ() / charge) # ... even though it's called '.getMZ()'
15 s.push_back(iso)
16 print("Isotope", iso.getMZ(), ":", iso.getIntensity())
17
18# deisotope with 10 ppm mass tolerance
19oms.Deisotoper.deisotopeAndSingleChargeDefault(s, 10, True)
20
21for p in s:
22 print("Mono peaks:", p.getMZ(), p.getIntensity())
which will print:
1[M+H]+ weight: 1018.495240604071
2Isotope 509.75180710055 : 0.5680345296859741
3Isotope 510.25348451945 : 0.3053518533706665
4Isotope 510.75516193835 : 0.09806874394416809
5Isotope 511.25683935725004 : 0.023309258744120598
6Isotope 511.75851677615003 : 0.0044969217851758
7Isotope 512.2601941950501 : 0.000738693168386817
8Mono peaks: 1018.496337734329 0.5680345296859741
Note that the algorithm presented here as some heuristics built into it, such
as assuming that the isotopic peaks will decrease after the first isotopic
peak. This heuristic can be tuned by setting the parameter
use_decreasing_model to False.
For more fine-grained control use start_intensity_check and leave use_decreasing_model = True (see Deisotoper –> C++ documentation).
Let’s look at a very heavy peptide, whose isotopic distribution is dominated by the first and second isotopic peak.
1seq = oms.AASequence.fromString("DFPIANGERDFPIANGERDFPIANGERDFPIANGER")
2print("[M+H]+ weight:", seq.getMonoWeight(oms.Residue.ResidueType.Full, 1))
3
4charge = 4
5seq_formula = seq.getFormula() + oms.EmpiricalFormula("H" + str(charge))
6isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(8))
7
8# Append isotopic distribution to spectrum
9s = oms.MSSpectrum()
10for iso in isotopes.getContainer():
11 iso.setMZ(iso.getMZ() / charge)
12 s.push_back(iso)
13 print("Isotope", iso.getMZ(), ":", iso.getIntensity())
14
15min_charge = 1
16min_isotopes = 2
17max_isotopes = 10
18use_decreasing_model = True # ignores all intensities
19start_intensity_check = 3 # here, the value does not matter, since we ignore intensities (see above)
20oms.Deisotoper.deisotopeAndSingleCharge( ## a function with all parameters exposed
21 s,
22 10,
23 True,
24 min_charge,
25 charge,
26 True,
27 min_isotopes,
28 max_isotopes,
29 True,
30 True,
31 True,
32 use_decreasing_model,
33 start_intensity_check,
34 False,
35 True
36)
37for p in s:
38 print("Mono peaks:", p.getMZ(), p.getIntensity())
1[M+H]+ weight: 4016.927437824572
2Isotope 1004.9878653713499 : 0.10543462634086609
3Isotope 1005.2387040808 : 0.22646738588809967
4Isotope 1005.48954279025 : 0.25444599986076355
5Isotope 1005.7403814996999 : 0.19825772941112518
6Isotope 1005.9912202091499 : 0.12000058591365814
7Isotope 1006.2420589185999 : 0.05997777357697487
8Isotope 1006.49289762805 : 0.025713207200169563
9Isotope 1006.7437363375 : 0.009702674113214016
10Mono peaks: 4016.9296320850867 0.10543462634086609
This successfully recovers the monoisotopic peak, even though it is not the most abundant peak.
Full Spectral De-Isotoping#
In the following code segment, we will use a sample measurement of BSA (Bovine Serum Albumin), and apply a simple algorithm in OpenMS for “deisotoping” a mass spectrum, which means grouping peaks of the same isotopic pattern charge state:
1from urllib.request import urlretrieve
2import matplotlib.pyplot as plt
3
4gh = "https://raw.githubusercontent.com/OpenMS/OpenMS/develop/doc/pyopenms"
5urlretrieve(gh + "/src/data/BSA1.mzML", "BSA1.mzML")
6
7e = oms.MSExperiment()
8oms.MzMLFile().load("BSA1.mzML", e)
9s = e[214]
10s.setFloatDataArrays([])
11oms.Deisotoper.deisotopeAndSingleCharge(
12 s,
13 0.1,
14 False,
15 1,
16 3,
17 True,
18 min_isotopes,
19 max_isotopes,
20 True,
21 True,
22 True,
23 use_decreasing_model,
24 start_intensity_check,
25 False,
26 True
27)
28
29print(e[214].size())
30print(s.size())
31
32e2 = oms.MSExperiment()
33e2.addSpectrum(e[214])
34oms.MzMLFile().store("BSA1_scan214_full.mzML", e2)
35e2 = oms.MSExperiment()
36e2.addSpectrum(s)
37oms.MzMLFile().store("BSA1_scan214_deisotoped.mzML", e2)
38
39maxvalue = max([p.getIntensity() for p in s])
40for p in s:
41 if p.getIntensity() > 0.25 * maxvalue:
42 print(p.getMZ(), p.getIntensity())
43
44unpicked_peak_data = e[214].get_peaks()
45plt.bar(unpicked_peak_data[0], unpicked_peak_data[1], snap=False)
46plt.show()
47
48picked_peak_data = s.get_peaks()
49plt.bar(picked_peak_data[0], picked_peak_data[1], snap=False)
50plt.show()
which produces the following output
140
41
974.4572680576728 6200571.5
974.4589691256419 3215808.75
As we can see, the algorithm has reduced \(140\) peaks to \(41\) deisotoped peaks. It also has identified a molecule with a singly charged mass of \(974.45\ Da\) as the most intense peak in the data (base peak).
Visualization#
The reason we see two peaks very close together becomes apparent once we look at the data in TOPPView which indicates that the \(974.4572680576728\) peak is derived from a \(\ce{2+}\) peak at m/z \(487.73\) and the peak at \(974.4589691256419\) is derived from a \(\ce{3+}\) peak at m/z \(325.49\): the algorithm has identified a single analyte in two charge states and deconvoluted the peaks to their nominal mass of a \(\ce{[M +H]+}\) ion, which produces two peaks very close together (\(\ce{2+}\) and \(\ce{3+}\) peak):
Looking at the full mass spectrum and comparing it to the original mass spectrum, we can see the original (centroided) mass spectrum on the top and the deisotoped mass spectrum on the bottom in blue. Note how hovering over a peak in the deisotoped mass spectrum indicates the charge state:
In the next section (Feature Detection), we will look at 2-dimensional deisotoping where instead of a single mass spectrum, multiple mass spectra from a LC-MS experiment are analyzed together. There algorithms analyze the full 2-dimensional (m/z and RT) signal and are generally more powerful than the 1-dimensional algorithm discussed here. However, not all data is 2 dimensional and the algorithm discussed here has many application in practice (e.g. single mass spectra, fragment ion mass spectra in DDA etc.).