Chemistry¶
OpenMS has representations for various chemical concepts including molecular formulas, isotopes, ribonucleotide and amino acid sequences as well as common modifications of amino acids or ribonucleotides.
Constants¶
OpenMS has many chemical and physical constants built in:
>>> import pyopenms
>>> help(pyopenms.Constants)
>>> print ("Avogadro's number is", pyopenms.Constants.AVOGADRO)
which provides access to constants such as Avogadro’s number or the electron mass.
Elements¶
In OpenMS, elements are stored in ElementDB
which has entries for dozens of
elements commonly used in mass spectrometry. The database is stored in the
pyopenms/share/OpenMS/CHEMISTRY/Elements.xml
file (where a user can
add new elements if necessary).
from pyopenms import *
edb = ElementDB()
edb.hasElement("O")
edb.hasElement("S")
oxygen = edb.getElement("O")
print(oxygen.getName())
print(oxygen.getSymbol())
print(oxygen.getMonoWeight())
print(oxygen.getAverageWeight())
sulfur = edb.getElement("S")
print(sulfur.getName())
print(sulfur.getSymbol())
print(sulfur.getMonoWeight())
print(sulfur.getAverageWeight())
isotopes = sulfur.getIsotopeDistribution()
print ("One mole of oxygen weighs", 2*oxygen.getAverageWeight(), "grams")
print ("One mole of 16O2 weighs", 2*oxygen.getMonoWeight(), "grams")
As we can see, the OpenMS ElementDB
has entries for common elements like
Oxygen and Sulfur as well as information on their average and monoisotopic
weight. Note that the monoisotopic weight is the weight of the most abundant
isotope while the average weight is the sum across all isotopes, weighted by
their natural abundance. Therefore, one mole of oxygen (O2) weighs slightly
more than a mole of only its monoisotopic isotope since natural oxygen is a
mixture of multiple isotopes.
Oxygen
O
15.994915
15.999405323160001
Sulfur
S
31.97207073
32.066084735289
One mole of oxygen weighs 31.998810646320003 grams
One mole of 16O2 weighs 31.98983 grams
Isotopes¶
We can also inspect the full isotopic distribution of oxygen and sulfur:
from pyopenms import *
edb = ElementDB()
oxygen = edb.getElement("O")
isotopes = oxygen.getIsotopeDistribution()
for iso in isotopes.getContainer():
print ("Oxygen isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
sulfur = edb.getElement("S")
isotopes = sulfur.getIsotopeDistribution()
for iso in isotopes.getContainer():
print ("Sulfur isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
OpenMS can compute isotopic distributions for individual elements which contain information for all stable elements. The current values in the file are average abundances found in nature, which may differ depending on location. The above code outputs the isotopes of oxygen and sulfur as well as their abundance:
Oxygen isotope 15.994915 has abundance 99.75699782371521 %
Oxygen isotope 16.999132 has abundance 0.03800000122282654 %
Oxygen isotope 17.999169 has abundance 0.20500000100582838 %
Sulfur isotope 31.97207073 has abundance 94.92999911308289 %
Sulfur isotope 32.971458 has abundance 0.7600000128149986 %
Sulfur isotope 33.967867 has abundance 4.2899999767541885 %
Sulfur isotope 35.967081 has abundance 0.019999999494757503 %
Mass Defect¶
Note
While all isotopes are created by adding one or more neutrons to the nucleus, this leads to different observed masses due to the mass defect, which describes the difference between the mass of an atom and the mass of its constituent particles. For example, the mass difference between 12C and 13C is slightly different than the mass difference between 14N and 15N, even though both only differ by a neutron from their monoisotopic element:
from pyopenms import *
edb = ElementDB()
isotopes = edb.getElement("C").getIsotopeDistribution().getContainer()
carbon_isotope_difference = isotopes[1].getMZ() - isotopes[0].getMZ()
isotopes = edb.getElement("N").getIsotopeDistribution().getContainer()
nitrogen_isotope_difference = isotopes[1].getMZ() - isotopes[0].getMZ()
print ("Mass difference between 12C and 13C:", carbon_isotope_difference)
print ("Mass difference between 14N and N15:", nitrogen_isotope_difference)
print ("Relative deviation:", 100*(carbon_isotope_difference -
nitrogen_isotope_difference)/carbon_isotope_difference, "%")
Mass difference between 12C and 13C: 1.003355
Mass difference between 14N and 15N: 0.997035
Relative deviation: 0.6298867300208343 %
This difference can actually be measured by a high resolution mass spectrometric instrument and is used in the tandem mass tag (TMT) labelling strategy.
For the same reason, the helium atom has a slightly lower mass than the mass of its constituent particles (two protons, two neutrons and two electrons):
from pyopenms import *
from pyopenms.Constants import *
helium = ElementDB().getElement("He")
isotopes = helium.getIsotopeDistribution()
mass_sum = 2*PROTON_MASS_U + 2*ELECTRON_MASS_U + 2*NEUTRON_MASS_U
helium4 = isotopes.getContainer()[1].getMZ()
print ("Sum of masses of 2 protons, neutrons and electrons:", mass_sum)
print ("Mass of He4:", helium4)
print ("Difference between the two masses:", 100*(mass_sum - helium4)/mass_sum, "%")
Sum of masses of 2 protons, neutrons and electrons: 4.032979924670597
Mass of He4: 4.00260325415
Difference between the two masses: 0.7532065888743016 %
The difference in mass is the energy released when the atom was formed (or in other words, it is the energy required to dissassemble the nucleus into its particles).
Molecular Formulae¶
Elements can be combined to molecular formulas (EmpiricalFormula
) which can
be used to describe molecules such as metabolites, amino acid sequences or
oligonucleotides. The class supports a large number of operations like
addition and subtraction. A simple example is given in the next few lines of
code.
1 2 3 4 5 6 7 8 | from pyopenms import *
methanol = EmpiricalFormula("CH3OH")
water = EmpiricalFormula("H2O")
ethanol = EmpiricalFormula("CH2") + methanol
print("Ethanol chemical formula:", ethanol.toString())
print("Ethanol composition:", ethanol.getElementalComposition())
print("Ethanol has", ethanol.getElementalComposition()[b"H"], "hydrogen atoms")
|
which produces
Ethanol chemical formula: C2H6O1
Ethanol composition: {b'C': 2, b'H': 6, b'O': 1}
Ethanol has 6 hydrogen atoms
Note how in line 5 we were able to make a new molecule by adding existing
molecules (for example by adding two EmpiricalFormula
objects). In this
case, we illustrated how to make ethanol by adding a CH2
methyl group to an
existing methanol molecule. Note that OpenMS describes sum formulae with the
EmpiricalFormula
object and does store structural information in this class.
Isotopic Distributions¶
OpenMS can also generate theoretical isotopic distributions from analytes
represented as EmpiricalFormula
. Currently there are two algorithms
implemented, CoarseIsotopePatternGenerator which produces unit mass isotope
patterns and FineIsotopePatternGenerator which is based on the IsoSpec
algorithm [1] :
from pyopenms import *
methanol = EmpiricalFormula("CH3OH")
ethanol = EmpiricalFormula("CH2") + methanol
print("Coarse Isotope Distribution:")
isotopes = ethanol.getIsotopeDistribution( CoarseIsotopePatternGenerator(4) )
prob_sum = sum([iso.getIntensity() for iso in isotopes.getContainer()])
print("This covers", prob_sum, "probability")
for iso in isotopes.getContainer():
print ("Isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
print("Fine Isotope Distribution:")
isotopes = ethanol.getIsotopeDistribution( FineIsotopePatternGenerator(1e-3) )
prob_sum = sum([iso.getIntensity() for iso in isotopes.getContainer()])
print("This covers", prob_sum, "probability")
for iso in isotopes.getContainer():
print ("Isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
which produces
Coarse Isotope Distribution:
This covers 0.9999999753596569 probability
Isotope 46.0418651914 has abundance 97.56630063056946 %
Isotope 47.045220029199996 has abundance 2.21499539911747 %
Isotope 48.048574867 has abundance 0.2142168115824461 %
Isotope 49.0519297048 has abundance 0.004488634294830263 %
Fine Isotope Distribution:
This covers 0.9994461630121805 probability
Isotope 46.0418651914 has abundance 97.5662887096405 %
Isotope 47.0452201914 has abundance 2.110501006245613 %
Isotope 47.0481419395 has abundance 0.06732848123647273 %
Isotope 48.046119191399995 has abundance 0.20049810409545898 %
The result calculated with the FineIsotopePatternGenerator
contains the hyperfine isotope structure with heavy isotopes of Carbon and
Hydrogen clearly distinguished while the coarse (unit resolution)
isotopic distribution contains summed probabilities for each isotopic peak
without the hyperfine resolution.
Please refer to our previous discussion on the mass defect to understand the
results of the hyperfine algorithm and why different elements produce slightly
different masses.
In this example, the hyperfine isotopic distribution will
contain two peaks for the nominal mass of 47: one at 47.045
for the
incorporation of one heavy 13C with a delta mass of 1.003355
and one at 47.048
for the incorporation of one heavy deuterium with a delta mass of 1.006277
.
These two peaks also have two different abundances (the heavy carbon one has
2.1% abundance and the deuterium one has 0.07% abundance). This can be understood given that
there are 2 carbon atoms and the natural abundance of 13C is about
1.1%, while the molecule has six hydrogen atoms and the natural abundance of
deuterium is about 0.02%. The fine isotopic generator will not generate the
peak at nominal mass 49 since we specified our cutoff at 0.1% total abundance
and the four peaks above cover 99.9% of the
isotopic abundance.
We can also decrease our cutoff and ask for more isotopes to be calculated:
from pyopenms import *
methanol = EmpiricalFormula("CH3OH")
ethanol = EmpiricalFormula("CH2") + methanol
print("Fine Isotope Distribution:")
isotopes = ethanol.getIsotopeDistribution( FineIsotopePatternGenerator(1e-6) )
prob_sum = sum([iso.getIntensity() for iso in isotopes.getContainer()])
print("This covers", prob_sum, "probability")
for iso in isotopes.getContainer():
print ("Isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
which produces
Fine Isotope Distribution:
This covers 0.9999993089130612 probability
Isotope 46.0418651914 has abundance 97.5662887096405 %
Isotope 47.0452201914 has abundance 2.110501006245613 %
Isotope 47.046082191400004 has abundance 0.03716550418175757 %
Isotope 47.0481419395 has abundance 0.06732848123647273 %
Isotope 48.046119191399995 has abundance 0.20049810409545898 %
Isotope 48.0485751914 has abundance 0.011413302854634821 %
Isotope 48.0494371914 has abundance 0.0008039440217544325 %
Isotope 48.0514969395 has abundance 0.0014564131561201066 %
Isotope 49.049474191399995 has abundance 0.004337066275184043 %
Isotope 49.0523959395 has abundance 0.00013835959862262825 %
Here we can observe more peaks and now also see the heavy oxygen peak at
47.04608
with a delta mass of 1.004217
(difference between 16O and 17O) at an
abundance of 0.04%, which is what we would expect for a single oxygen atom.
Even though the natural abundance of deuterium (0.02%) is lower than 17O
(0.04%), since there are six hydrogen atoms in the molecule and only one
oxygen, it is more likely that we will see a deuterium peak than a heavy oxygen
peak. Also, even for a small molecule like ethanol, the differences in mass
between the hyperfine peaks can reach more than 110 ppm (48.046 vs 48.051).
Note that the FineIsotopePatternGenerator will generate peaks until the total
error has decreased to 1e-6, allowing us to cover 0.999999 of the probability.
OpenMS can also produce isotopic distribution with masses rounded to the nearest integer:
isotopes = ethanol.getIsotopeDistribution( CoarseIsotopePatternGenerator(5, True) )
for iso in isotopes.getContainer():
print ("Isotope", iso.getMZ(), "has abundance", iso.getIntensity()*100, "%")
Isotope 46.0 has abundance 97.56627082824707 %
Isotope 47.0 has abundance 2.214994840323925 %
Isotope 48.0 has abundance 0.214216741733253 %
Isotope 49.0 has abundance 0.0044886332034366205 %
Isotope 50.0 has abundance 2.64924580051229e-05 %
Amino Acids¶
An amino acid residue is represented in OpenMS by the class Residue
. It provides a
container for the amino acids as well as some functionality. The class is able
to provide information such as the isotope distribution of the residue, the
average and monoisotopic weight. The residues can be identified by their full
name, their three letter abbreviation or the single letter abbreviation. The
residue can also be modified, which is implemented in the Modification
class.
Additional less frequently used parameters of a residue like the gas-phase
basicity and pk values are also available.
>>> from pyopenms import *
>>> lys = ResidueDB().getResidue("Lysine")
>>> lys.getName()
'Lysine'
>>> lys.getThreeLetterCode()
'LYS'
>>> lys.getOneLetterCode()
'K'
>>> lys.getAverageWeight()
146.18788276708443
>>> lys.getMonoWeight()
146.1055284466
>>> lys.getPka()
2.16
>>> lys.getFormula().toString()
u'C6H14N2O2'
As we can see, OpenMS knows common amino acids like lysine as well as
some properties of them. These values are stored in Residues.xml
in the
OpenMS share folder and can, in principle, be modified.
Amino Acid Modifications¶
An amino acid residue modification is represented in OpenMS by the class
ResidueModification
. The known modifications are stored in the
ModificationsDB
object, which is capable of retrieving specific
modifications. It contains UniMod as well as PSI modifications.
from pyopenms import *
ox = ModificationsDB().getModification("Oxidation")
print(ox.getUniModAccession())
print(ox.getUniModRecordId())
print(ox.getDiffMonoMass())
print(ox.getId())
print(ox.getFullId())
print(ox.getFullName())
print(ox.getDiffFormula())
UniMod:35
35
15.994915
Oxidation
Oxidation (N)
Oxidation or Hydroxylation
O1
thus providing information about the “Oxidation” modification. As above, we can investigate the isotopic distribution of the modification (which in this case is identical to the one of Oxygen by itself):
isotopes = ox.getDiffFormula().getIsotopeDistribution(CoarseIsotopePatternGenerator(5))
for iso in isotopes.getContainer():
print (iso.getMZ(), ":", iso.getIntensity())
Which will print the isotopic pattern of the modification (Oxygen):
15.994915 : 0.9975699782371521
16.998269837800002 : 0.0003800000122282654
18.0016246756 : 0.002050000010058284
Ribonucleotides¶
A ribonucleotide describes
one of the building blocks of DNA and RNA. In OpenMS, a ribonucleotide in its
modified or unmodified form is represented by the Ribonucleotide
class in
OpenMS. The class is able to provide information such as the isotope
distribution of the residue, the average and monoisotopic weight. The residues
can be identified by their full name, their three letter abbreviation or the
single letter abbreviation. Modified ribonucleotides are represented by the
same class. Currently, support for RNA is implemented.
>>> from pyopenms import *
>>> uridine = RibonucleotideDB().getRibonucleotide(b"U")
>>> uridine.getName()
'uridine'
>>> uridine.getCode()
'U'
>>> uridine.getAvgMass()
244.2043
>>> uridine.getMonoMass()
244.0695
>>> uridine.getFormula().toString()
'C9H12N2O6'
>>> uridine.isModified()
False
>>>
>>> methyladenosine = RibonucleotideDB().getRibonucleotide(b"m1A")
>>> methyladenosine.getName()
'1-methyladenosine'
>>> methyladenosine.isModified()
True
[1] | Łącki MK, Startek M, Valkenborg D, Gambin A. IsoSpec: Hyperfast Fine Structure Calculator. Anal Chem. 2017 Mar 21;89(6):3272-3277. doi: 10.1021/acs.analchem.6b01459. |