Mass Decomposition#

Fragment Mass to Amino Acid Composition#

One challenge often encountered in mass spectrometry is the question of the composition of a specific mass fragment only given its mass. For example, for the internal fragment mass \(262.0953584466\) there are three different interpretations within a narrow mass band of \(0.05\ Th\):

import pyopenms as oms

print(
    oms.AASequence.fromString("MM").getMonoWeight(oms.Residue.ResidueType.Internal, 0)
)
print(
    oms.AASequence.fromString("VY").getMonoWeight(oms.Residue.ResidueType.Internal, 0)
)
print(
    oms.AASequence.fromString("DF").getMonoWeight(oms.Residue.ResidueType.Internal, 0)
)
262.08097003420005
262.1317435742
262.0953584466

As you can see, already for relatively simple two-amino acid combinations, multiple explanations may exist. OpenMS provides an algorithm to compute all potential amino acid combinations that explain a certain mass in the MassDecompositionAlgorithm class:

md_alg = oms.MassDecompositionAlgorithm()
param = md_alg.getParameters()
param.setValue("tolerance", 0.05)
param.setValue("residue_set", b"Natural19WithoutI")
md_alg.setParameters(param)
decomps = []
md_alg.getDecompositions(decomps, 262.0953584466)
for d in decomps:
    print(d.toExpandedString())

Which outputs the three potential compositions for the mass \(262.0953584466\). Note that every single combination of amino acids is only printed once, e.g. only DF is reported while the isobaric FD is not reported. This makes the algorithm more efficient.

Naive Algorithm#

We can compare this result with a more naive algorithm which simply iterates through all combinations of amino acid residues until the sum of of all residues equals the target mass:

mass = 262.0953584466
residues = oms.ResidueDB().getResidues(b"Natural19WithoutI")


def recursive_mass_decomposition(mass_sum, peptide):
    if abs(mass_sum - mass) < 0.05:
        print(peptide + "\t" + str(mass_sum))
    for r in residues:
        new_mass = mass_sum + r.getMonoWeight(oms.Residue.ResidueType.Internal)
        if new_mass < mass + 0.05:
            recursive_mass_decomposition(
                new_mass, peptide + r.getOneLetterCode()
            )


print("Mass explanations by naive algorithm:")
recursive_mass_decomposition(0, "")

Note that this approach is substantially slower than the OpenMS algorithm and also does not treat DF and FD as equivalent, instead outputting them both as viable solutions.

Stand-Alone Program#

We can use pyOpenMS to write a short program that takes a mass and outputs all possible amino acid combinations for that mass within a given tolerance:

 1import sys
 2
 3# Example for mass decomposition (mass explanation)
 4# Internal residue masses (as observed e.g. as mass shifts in tandem mass spectra)
 5# are decomposed in possible amino acid strings that match in mass.
 6
 7mass = float(sys.argv[1])
 8tol = float(sys.argv[2])
 9
10md_alg = oms.MassDecompositionAlgorithm()
11param = md_alg.getParameters()
12param.setValue("tolerance", tol)
13param.setValue("residue_set", b"Natural19WithoutI")
14md_alg.setParameters(param)
15decomps = []
16md_alg.getDecompositions(decomps, mass)
17for d in decomps:
18  print(d.toExpandedString().decode())

If we copy the above code into a script, for example mass_decomposition.py, we will have a stand-alone software that takes two arguments: first the mass to be de-composed and secondly the tolerance to be used (which are collected on line 8 and 9). We can call it as follows:

python mass_decomposition.py 999.4773990735001 1.0
python mass_decomposition.py 999.4773990735001 0.001

Try to change the tolerance parameter. The parameter has a very large influence on the reported results, for example for \(1.0\) tolerance, the algorithm will produce \(80,463\) results while for a \(0.001\) tolerance, only \(911\) results are expected.

Spectrum Tagger#

 1tsg = oms.TheoreticalSpectrumGenerator()
 2param = tsg.getParameters()
 3param.setValue("add_metainfo", "false")
 4param.setValue("add_first_prefix_ion", "true")
 5param.setValue("add_a_ions", "true")
 6param.setValue("add_losses", "true")
 7param.setValue("add_precursor_peaks", "true")
 8tsg.setParameters(param)
 9
10# spectrum with charges +1 and +2
11test_sequence = oms.AASequence.fromString("PEPTIDETESTTHISTAGGER")
12spec = oms.MSSpectrum()
13tsg.getSpectrum(spec, test_sequence, 1, 2)
14
15print(spec.size())  # should be 357
16
17# tagger searching only for charge +1
18tags = []
19tagger = oms.Tagger(2, 10.0, 5, 1, 1, [], [])
20tagger.getTag(spec, tags)
21
22print(len(tags))  # should be 890
23
24b"EPTID" in tags  # True
25b"PTIDE" in tags  # True
26b"PTIDEF" in tags  # False