Mass Decomposition
Fragment Mass to Amino Acid Composition
One challenge often encountered in mass spectrometry is the question of the composition of a specific mass fragment only given its mass. For example, for the internal fragment mass \(262.0953584466\) there are three different interpretations within a narrow mass band of \(0.05\ Th\):
from pyopenms import *
print(
AASequence.fromString("MM").getMonoWeight(Residue.ResidueType.Internal, 0)
)
print(
AASequence.fromString("VY").getMonoWeight(Residue.ResidueType.Internal, 0)
)
print(
AASequence.fromString("DF").getMonoWeight(Residue.ResidueType.Internal, 0)
)
262.08097003420005
262.1317435742
262.0953584466
As you can see, already for relatively simple two-amino acid combinations,
multiple explanations may exist. OpenMS provides an algorithm to compute all
potential amino acid combinations that explain a certain mass in the
MassDecompositionAlgorithm
class:
md_alg = MassDecompositionAlgorithm()
param = md_alg.getParameters()
param.setValue("tolerance", 0.05)
param.setValue("residue_set", b"Natural19WithoutI")
md_alg.setParameters(param)
decomps = []
md_alg.getDecompositions(decomps, 262.0953584466)
for d in decomps:
print(d.toExpandedString())
Which outputs the three potential compositions for the mass \(262.0953584466\).
Note that every single combination of amino acids is only printed once, e.g.
only DF
is reported while the isobaric FD
is not reported. This makes
the algorithm more efficient.
Naive Algorithm
We can compare this result with a more naive algorithm which simply iterates through all combinations of amino acid residues until the sum of of all residues equals the target mass:
mass = 262.0953584466
residues = ResidueDB().getResidues(b"Natural19WithoutI")
def recursive_mass_decomposition(mass_sum, peptide):
if abs(mass_sum - mass) < 0.05:
print(peptide + "\t" + str(mass_sum))
for r in residues:
new_mass = mass_sum + r.getMonoWeight(Residue.ResidueType.Internal)
if new_mass < mass + 0.05:
recursive_mass_decomposition(
new_mass, peptide + r.getOneLetterCode()
)
print("Mass explanations by naive algorithm:")
recursive_mass_decomposition(0, "")
Note that this approach is substantially slower than the OpenMS algorithm and
also does not treat DF
and FD
as equivalent, instead outputting them
both as viable solutions.
Stand-Alone Program
We can use pyOpenMS to write a short program that takes a mass and outputs all possible amino acid combinations for that mass within a given tolerance:
1import sys
2
3# Example for mass decomposition (mass explanation)
4# Internal residue masses (as observed e.g. as mass shifts in tandem mass spectra)
5# are decomposed in possible amino acid strings that match in mass.
6
7mass = float(sys.argv[1])
8tol = float(sys.argv[2])
9
10md_alg = MassDecompositionAlgorithm()
11param = md_alg.getParameters()
12param.setValue("tolerance", tol)
13param.setValue("residue_set", b"Natural19WithoutI")
14md_alg.setParameters(param)
15decomps = []
16md_alg.getDecompositions(decomps, mass)
17for d in decomps:
18 print(d.toExpandedString().decode())
If we copy the above code into a script, for example mass_decomposition.py
,
we will have a stand-alone software that takes two arguments: first the mass to
be de-composed and secondly the tolerance to be used (which are collected on
line 8 and 9). We can call it as follows:
python mass_decomposition.py 999.4773990735001 1.0
python mass_decomposition.py 999.4773990735001 0.001
Try to change the tolerance parameter. The parameter has a very large influence on the reported results, for example for \(1.0\) tolerance, the algorithm will produce \(80,463\) results while for a \(0.001\) tolerance, only \(911\) results are expected.
Spectrum Tagger
1tsg = TheoreticalSpectrumGenerator()
2param = tsg.getParameters()
3param.setValue("add_metainfo", "false")
4param.setValue("add_first_prefix_ion", "true")
5param.setValue("add_a_ions", "true")
6param.setValue("add_losses", "true")
7param.setValue("add_precursor_peaks", "true")
8tsg.setParameters(param)
9
10# spectrum with charges +1 and +2
11test_sequence = AASequence.fromString("PEPTIDETESTTHISTAGGER")
12spec = MSSpectrum()
13tsg.getSpectrum(spec, test_sequence, 1, 2)
14
15print(spec.size()) # should be 357
16
17# tagger searching only for charge +1
18tags = []
19tagger = Tagger(2, 10.0, 5, 1, 1, [], [])
20tagger.getTag(spec, tags)
21
22print(len(tags)) # should be 890
23
24b"EPTID" in tags # True
25b"PTIDE" in tags # True
26b"PTIDEF" in tags # False