Welcome to the documentation for pyOpenMS! Here, you will find a comprehensive guide to using pyOpenMS for mass spectrometry analysis. Whether you are new to mass spectrometry or an experienced user, this documentation will provide you with a wealth of information to help you get the most out of pyOpenMS.
In the side panel you can navigate the different topics. You will find a basic introduction to mass spectrometry, including key concepts and terminology as well as detailed instructions for installing pyOpenMS on your system, so you can start using it right away.
We will then cover some basic mass spectrometry concepts and how they are represented in PyOpenMS. From there, we will dive into a selection of algorithms that are commonly used in mass spectrometry analysis, including peak picking, alignment, and quantification.
For more advanced users, we have included sections on some of the more complex topics in mass spectrometry analysis. And for those who want to get into the nitty-gritty, we have included a full API documentation and glossary.
We hope this documentation will serve as a valuable resource for you as you explore the world of mass spectrometry analysis with pyOpenMS.
Proteomics and metabolomics focus on complex interactions within biological systems; the former is centered on proteins while the latter is based on metabolites. To understand these interactions, we need to accurately identify the different biological components involved. Liquid chromatography (LC) and mass spectrometry (MS) are the analytical techniques used to isolate and identify biological components in proteomics and metabolomics. LC-MS data can be difficult to analyze manually given its amount and complexity. Therefore, we need specialized software that can analyze high-throughput LC-MS data quickly and accurately.
Why use OpenMS#
OpenMS is an open-source, C++ framework for analyzing large volumes of mass spectrometry data. It has been specially designed for analyzing high performance LC-MS data but over recent times, has been extended to analyze data generated by other techniques.
OpenMS in recent times has been expanded to support a wide variety of mass spectrometry experiments. To design your analysis solution, contact the OpenMS team today.
To use OpenMS effectively, an understanding of liquid chromatography (LC) and mass spectrometry (MS) is required as many of the algorithms are based on these techniques. This section provides a detailed explanation on LC and MS, and how they are combined to identify and quantify substances.
Liquid chromatography (LC)#
Liquid chromatography is a technique used by life scientists to separate molecules based on a specific physical or chemical property.
For more information on LC, view this video.
There are many types of chromatography, but this section focuses on LC as it is widely used in proteomics and metabolomics.
LC separates molecules based on a specific physical or chemical property by mixing a sample containing the molecules of interest (otherwise known as analytes) in a liquid solution.
Key components of LC#
A LC setup is made up of the following components:
A liquid solution, known as the mobile phase, containing the analytes.
A pump which transports the liquid solution.
A stationary phase which is a solid, homogeneous substance.
A column that contains the stationary phase.
A detector that plots the time it takes for the analyte to escape the column (retention time) against the analyte’s concentration. This plot is called a chromatogram.
Refer to the image below for a diagrammatic representation of a LC setup.
How does LC work?#
The liquid solution containing the analytes is pumped through a column that is attached to the stationary phase. Analytes are separated based on how strongly they interact with each phase. Some analytes will interact strongly with the mobile phase while others will be strongly attracted to the stationary phase, depending on their physical or chemical properties. The stronger an analyte’s attraction is to the mobile phase, the faster it will leave the column. The time it takes for an analyte to escape from the column is called the analyte’s retention time (RT). As a result of their differing attractions to the mobile and stationary phases, different analytes will have different RTs, which is how separation occurs.
The RTs for each analyte are recorded by a detector. The most common detector used is the mass spectrometer, which we discuss later. However, other detection methods exist, such as:
Light absorption (photometric detector)
Change in diffraction index
High Performance Liquid Chromatography (HPLC)#
HPLC is the most commonly used technique for separating proteins and metabolites. In HPLC, a high-pressured pump is used to transport a liquid (solvent) containing the molecules of interest through a thin capillary column. The stationary phase is ‘packed’ into the column.
VideoFor more information on HPLC, view this video.
Several variations of HPLC exist such as: - Reversed-phase (RP) chromatography - Strong cation/anion exchange (SCX/SAX) chromatography - Affinity chromatography - Size exclusion chromatography
Special Case of HPLC: Reversed-Phase (RP) Chromatography#
RP chromatography is the most commony type of HPLC with biological samples. In reversed-phase liquid chromatography, the solid phase is modified to become hydrophobic, when it is originally hydrophilic, hence the term ‘reversed-phase’. The liquid phase is a mixture of water and an organic solvent. The separation of molecules happens based on the following behavior: hydrophilic analytes have a high affinity to the mobile phase and escape the column quickly while hydrophobic analytes have a high affinity towards the organic solvent and therefore, take a longer time to escape the column.
VideoFor more information on RP chromatography, view this video.
Mass Spectrometry (MS)#
Mass spectrometry is an analytical technique used to determine the abundance of molecules in a sample.
Key components of MS#
There are three key components in a mass spectrometer:
- An ion source, which generates ions from the incoming sample. All MS techniques rely
on ionized molecules to control their movement in an electric field.
- A mass analyzer, which separates the ions according to their mass-to-charge (m/z) ratio.
There are several types such as time of flight (tandem mass spectrometry), orbitrap and quadrupole mass analyzers. Depending on the mass analyzer, OpenMS offers calibration tools, so that highly accurate results can be achieved.
- A detector, which scans ions at a given time point producing a mass spectrum, where the intensity is
plotted against the m/z.
Refer to the image below for a diagrammatic representation of the key components in MS.
We want the analytes to move through the electrostatic and electromagnetic fields in the mass analyzer. To achieve this objective, we need to convert them to ions by charging them. There are a number of ways to charge our analytes including:
Electrospray Ionization (ESI)
Matrix Assisted Laser Desorption/Ionization (MALDI)
Electron Impact Ionization (EI)
In proteomics and metabolomics, ESI and MALDI are used because they are soft ionization techniques. A soft ionization technique is one which charges analytes while keeping the molecules of interest largely intact, so that they can be characterized easily at a later stage. Hard ionization techniques such as EI shatter analytes in smaller fragments, making it difficult to characterize large molecules.
Given that OpenMS focuses on proteomics and metabolomics applications, we will describe ESI and MALDI in further detail.
Electrospray Ionization (ESI)#
ESI can be broken down into the following steps.
The sample is dissolved in a polar, volatile buffer.
The sample - dissolved in the buffer - is pumped through a thin, stainless steel capillary.
The sample is converted to small, charged, stable droplets (aerosolized) by applying high voltage.
The aerosol is directed through regions of high vacuum until the droplets evaporate until only the charged molecules are left.
The particles are fed to the mass analyzer.
Refer to the image below for a diagrammatic representation of the steps in ESI.
VideoFor more information on ESI, view this video.
Matrix Assisted Laser Desorption/Ionization (MALDI)#
MALDI can be broken down into the following steps: 1. The analytes are mixed with a small organic molecule known as a matrix. 2. The mixture is exposed to radiation with short pulses of laser light, charging the matrix. 3. The matrix transfers its charge to the analytes because the wavelength of the laser light is the same as the absorbance maximum of the matrix. 4. The analytes become charged and are fed to the mass analyzer.
Refer to the image below for a diagrammatic representation of the steps in MALDI.
VideoFor more information on MALDI, view this video.
Once the analytes have been charged by the ion source, we want to now sort the analytes by their mass-to-charge ratio for easy identification.
A number of mass analyzers exists. These include:
The next sections describe each analyzer type in detail.
In a quadrupole analyzer, you can set the quadrupole voltage so that ions with a specific m/z ratio travel through. The oscillating electrostatic fields stabilize the flight path for the ions so that they can pass through the quadrupole. Other ions will be accelerated out of the quadrupole and will not make it to the end.
Refer to the image below for a diagrammatic representation of the quadrupole analyzer.
VideoFor more information on :term:`quadrupole` analyzers, view this video.
In a time-of-flight analyzer, ions are extracted from the ion source through an electrostatic field in pulses in a field-free drift zone. An electrostatic mirror called a reflectron reflects the ions back onto the next component of mass spectrometry, the detector. The detector counts the particles and records the time of flight from extraction to the moment the particle hits the detector.
Refer to the image below for a diagrammatic representation of the TOF analyzer.
Lighter ions fly faster than heavier ions of the same charge and will arrive earlier at the detector. Therefore, an ion’s time of flight depends on the ion’s mass. The ion’s time of flight is also dependant on the ion’s charge. This can be demonstrated by using the following equations: 1. Potential energy is transferred to an ion with charge q accelerated by an electrostatic field with voltage.
The potential energy is converted to kinetic energy as the ion accelerates.
We know that for a given path,**s**, from extraction to the detector, the time of flight, t is equal to:
Therefore,, t, for a given instrument’s path length, s, depends on an ion’s charge and mass.
VideoFor more information on TOF analyzers, view this video.
The orbitrap analyzer is the most frequently used analyzer in mass spectrometry for proteomics and metabolomics applications. It consists of two outer electrodes and a central electrode. Ions are captured inside the analyzer because of an applied electrostatic field. The ions in the orbitrap analyzer oscillate around the central electrode along the axis of the electrostatic field at a set frequency, ω. This frequency is used to determine the mass-to-charge ratio using the following formula:
, where k is a constant.
VideoFor more information on orbitrap analyzers, view this video.
The following diagram is a conceptual representation of an orbitrap mass analyzer.
Identifying Molecules with tandem mass spectrometry (MS2)#
To get better results, we can use two mass analyzers sequentially to generate and analyze ions. This technique is called tandem mass spectrometry MS2. Tandem mass spectrometry is especially useful for linear polymers like proteins, RNA and DNA.
With MS2, ions called precursor ions are isolated and fragmented into ion fragments or product ions. A mass spectrum is recorded for both the precursor and the product ions.
VideoFor more information on MS2, view this video.
Different fragmentation techniques to fragment peptides exist:
Pulsed Q Dissociation (PQD)
Electron transfer dissociation (ETD)
Electron capture dissociation (ECD)
Higher energy collision dissociation (HCD)
CID is the most frequently used fragmentation technique and will therefore be discussed in more detail in the following section.
Collision-induced dissociation is a method to fragment peptides using an inert gas such as argon or helium. Selected primary or precursor ions enter a collision cell filled with the inert gas. The application of the inert gas on the precursor ions causes the precursor ions that reach the energy threshold to fragment into smaller, product ions and or neutral losses. A mass spectrum is recorded for both the precursor ions and the product ions. The mass spectrum for the precursor ions will give you the mass for the entire peptide while the product ions will inform you about it’s amino acid composition.
VideoFor more information on CID, view this video.
Liquid chromatography is often coupled with mass spectrometry to reduce complexity in the mass spectra. If complex samples were directly fed to a mass spectrometer, you would not be able to detect the less abundant analyte ions. The separated analytes from the liquid chromatography setup are directly injected into the ion source from the mass spectrometry setup. Multiple analytes that escape the column at the same time are separated by their mass-to-charge ratio using the mass spectrometer.
Refer to the image below for a diagrammatic representation of the LC-MS setup.
From the LC-MS setup, a set of spectra called a peak map is produced. In a peak map, each spectrum represents the ions detected at a particular retention time. Each peak in a spectrum has a retention time, :term`mass-to-charge` and intensity dimension.
From the LC-MS setup, a series of spectra are ‘stacked’ together to form what is known as a peak map. Each spectrum in a peak map is a collection of data points called peaks which indicate the retention time, mass-to-charge and intensity of each detected ion. Analyzing peak maps is difficult as different compounds can elute at the same time which means that peaks can overlap. Therefore, sophisticated techniques are required for the accurate identification and quantification of molecules.
The image below includes a spectrum at a given retention time (left) and a peak map (right).
VideoFor more information on a *specific* application of LC-MS, view this video.
Identification and Quantification of Ions#
While the combination of liquid chromatography and mass spectrometry can ease the process of characterising molecules of interest, further techniques are required to easily identify and quantify these molecules. This section discusses both labeled and label-free quantification techniques.
Relative quantification is one strategy where one sample is chemically treated and compared to another sample without treatment. This section discusses a particular relative quantification technique called labeling or stable isotope labeling which involves the addition of isotopes to one sample. An isotope of an element behaves the same chemically but has a different mass. Stable isotope labeling is used in mass spectrometry so that scientists can easily identify proteins and metabolites.
Two types of stable isotope labeling exist: chemical labeling and metabolic labeling.
During chemical labeling, the label is attached at specific functional groups in a molecule like the N-terminus of a peptide or specific side chains.
Chemical labeling occurs late in the process, therefore experiments that incorporate this technique are not highly reproducible.
Isobaric labeling, is a technique where peptides and proteins are labeled with chemical groups that have an identical mass, but vary in terms of of distribution of heavy isotopes in their structure.
OpenMS contains tools that analyze data from isobaric labeling experiments.
During metabolic labeling, the organism is ‘fed’ with labeled metabolites. Metabolites include but are not limited to amino acids, nitrogen sources and glucose. Unlike chemical labeling, metabolic labeling occurs early in the study. Therefore, experiments that incorporate metabolic labeling are highly reproducible.
Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC)#
In SILAC, the labeled amino acids are fed to the cell culture. The labels are integrated into the proteins after a period. The labeled sample is then compared with the unlabeled sample.
OpenMS contains tools that analyze data from SILAC experiments.
Label-free quantification (LFQ)#
LFQ is a cheap and natural method of quantifying molecules of interest. As the name suggests, no labeling of molecules is involved.
LFQ includes the following steps:
Conduct replicate experiments.
Generate LC-MS maps for each experiment.
Find features in all LC-MS maps. A features is a collection of peaks that belong to a chemical compound.
Align maps to address shifts in retention times.
Match corresponding features in different maps. We refer to this as grouping or linking.
Identify feature groups, called consensus features.
Quantify consensus features.
VideoFor more information on LFQ, view this video. For more information on the steps involved in LFQ, view this video.
features finding is method for identifying all peaks belonging to a chemical compound. Feature finding involves the following steps:
Extension where we collect all data points we think belong to the peptide.
Refinement where we remove peaks that we think do not belong to the peptide.
Fit an optimal model to the isolated peaks.
The above steps are iterative; we repeat these steps until no improvement can be made to the model.
OpenMS contains a number of feature finding algorithms.
VideoFor more information on feature finding, view this video.