pyOpenMS in R#
Currently, there are no native wrappers for the OpenMS library in R, however we can use the “reticulate” package in order to get access to the full functionality of pyOpenMS in the R programming language.
Install the “reticulate” R Package#
In order to use all pyOpenMS functionalities in R, we suggest to use the “reticulate” R package.
A thorough documentation is available at: https://rstudio.github.io/reticulate/
1install.packages("reticulate")
Installation of pyOpenMS is a requirement as well and it is necessary to make sure that R is using the same Python environment.
In case R is having trouble to find the correct Python environment, you can set it by hand as in this example (using miniconda, you will have to adjust the file path to your system to make this work). You will need to do this before loading the “reticulate” library:
1Sys.setenv(RETICULATE_PYTHON = "/usr/local/miniconda3/envs/py37/bin/python")
Or after loading the “reticulate” library:
1library("reticulate")
2use_python("/usr/local/miniconda3/envs/py37/bin/python")
Import pyOpenMS in R#
After loading the “reticulate” library you should be able to import pyOpenMS into R
1library(reticulate)
2ropenms=import("pyopenms", convert = FALSE)
This should now give you access to all of pyOpenMS in R. Importantly, the convert option has to be set to FALSE, since type conversions such as 64bit integers will cause a problem.
You should now be able to interact with the OpenMS library and, for example, read and write mzML files:
1library(reticulate)
2ropenms=import("pyopenms", convert = FALSE)
3exp = ropenms$MSExperiment()
4ropenms$MzMLFile()$store("testfile.mzML", exp)
which will create an empty mzML file called testfile.mzML.
Getting help#
Using the “reticulate” R package provides a way to access the pyOpenMS information
about the available functions and methods. We can inspect individual pyOpenMS objects
through the py_help
function:
1library(reticulate)
2ropenms=import("pyopenms", convert = FALSE)
3idXML=ropenms$IdXMLFile
4py_help(idXML)
5
6Help on class IdXMLFile in module pyopenms.pyopenms_4:
7
8class IdXMLFile(__builtin__.object)
9| Methods defined here:
10|
11| __init__(...)
12| Cython signature: void IdXMLFile()
13|
14| load(...)
15| Cython signature: void load(String filename, libcpp_vector[ProteinIdentification] & protein_ids, libcpp_vector[PeptideIdentification] & peptide_ids)
16[...]
Alternatively, the autocompletion functionality of RStudio can be used:
In this case, the help function indicates that the idXML$load()
function requires
a filename as string
an empty vector for pyopenms.ProteinIdentification objects
an empty vector for pyopenms.PeptideIdentification objects
In order to read peptide identification data, we can download the idXML example file
Creating an empty R list()
unfortunately is not equal to the empty python list []
.
Therefore in this case we need to use the reticulate::r_to_py()
and reticulate::py_to_r()
functions:
1idXML=ropenms$IdXMLFile()
2
3download.file("https://github.com/OpenMS/OpenMS/raw/master/share/OpenMS/examples/BSA/BSA1_OMSSA.idXML", "BSA1_OMSSA.idXML")
4
5f="BSA1_OMSSA.idXML"
6pepids=r_to_py(list())
7protids=r_to_py(list())
8
9idXML$load(f, protids, pepids)
10
11pepids=py_to_r(pepids)
12
13pephits=pepids[[1]]$getHits()
14
15pepseq=pephits[[1]]$getSequence()
16
17print(paste0("Sequence: ", pepseq))
18
19[1] "Sequence: SHC(Carbamidomethyl)IAEVEK"
An example use case#
Reading an mzML File#
pyOpenMS supports a variety of different files through the implementations in OpenMS. In order to read mass spectrometric data, we can download the mzML example file
1download.file("https://raw.githubusercontent.com/OpenMS/OpenMS/develop/share/OpenMS/examples/BSA/BSA1.mzML", "BSA1.mzML")
2
3library(reticulate)
4ropenms=import("pyopenms", convert = FALSE)
5mzML=ropenms$MzMLFile()
6exp = ropenms$MSExperiment()
7mzML$load("BSA1.mzML", exp)
which will load the content of the “BSA1.mzML” file into the exp
variable of type MSExperiment
.
We can now inspect the properties of this object:
1py_help(exp)
2Help on MSExperiment object:
3
4class MSExperiment(__builtin__.object)
5 | Methods defined here:
6 ...
7 | getNrChromatograms(...)
8 | Cython signature: size_t getNrChromatograms()
9 |
10 | getNrSpectra(...)
11 | Cython signature: size_t getNrSpectra()
12 |
13 ...
which indicates that the variable exp
has (among others) the functions
getNrSpectra()
and getNrChromatograms()
.
We can now try one of these functions:
1exp$getNrSpectra()
21684
and indeed we see that we get information about the underlying MS data. We can iterate through the spectra as follows:
Visualize spectra#
You can easily visualise ms1 level precursor maps:
1library(ggplot2)
2
3spectra = py_to_r(exp$getSpectra())
4
5peaks_df=c()
6for (i in spectra) {
7 if (i$getMSLevel()==1){
8 peaks=do.call("cbind", i$get_peaks())
9 rt=i$getRT()
10 peaks_df=rbind(peaks_df,cbind(peaks,rt))
11 }
12}
13
14peaks_df=data.frame(peaks_df)
15colnames(peaks_df)=c('MZ','Intensity','RT')
16peaks_df$Intensity=log10(peaks_df$Intensity)
17
18ggplot(peaks_df, aes(x=RT, y=MZ) ) +
19geom_point(size=1, aes(colour = Intensity), alpha=0.25) +
20theme_minimal() +
21scale_colour_gradient(low = "blue", high = "yellow")
Or visualize a particular ms2 spectrum:
1library(ggplot2)
2
3spectra = py_to_r(exp$getSpectra())
4
5# Collect all MS2 peak data in a list
6peaks_ms2=list()
7for (i in spectra) {
8 if (i$getMSLevel()==2){
9 peaks=do.call("cbind",i$get_peaks())
10 peaks_ms2[[i$getNativeID()]]=data.frame(peaks)
11 }
12}
13
14ms2_spectrum=peaks_ms2[["spectrum=3529"]]
15colnames(ms2_spectrum)=c("MZ","Intensity")
16
17ggplot(ms2_spectrum, aes(x=MZ, y=Intensity)) +
18geom_segment( aes(x=MZ, xend=MZ, y=0, yend=Intensity)) +
19theme_minimal()
Alternatively, we could also have used apply
to obtain the peak data, which
is more idiomatic way of doing things for the R programming language:
1ms1 = sapply(spectra, function(x) x$getMSLevel()==1)
2peaks = sapply(spectra[ms1], function(x) cbind(do.call("cbind", x$get_peaks()),x$getRT()))
3peaks = data.frame( do.call("rbind", peaks) )
4
5ms2 = spectra[!ms1][[1]]$get_peaks()
6ms2_spectrum = data.frame( do.call("cbind", ms2) )
Iteration#
Iterating over pyOpenMS objects is not equal to iterating over R vectors or
lists. Note that for many applications, there is a more efficient way to access
data (such as get_peaks()
instead of iterating over individual peaks).
Therefore we can not directly apply the usual functions such as apply()
and have to use reticulate::iterate()
instead:
1spectrum = ropenms$MSSpectrum()
2mz = seq(1500, 500, -100)
3i = seq(10, 2000, length.out = length(mz))
4spectrum$set_peaks(list(mz, i))
5
6iterate(spectrum, function(x) {print(paste0("M/z :" , x$getMZ(), " Intensity: ", x$getIntensity()))})
7
8[1] "M/z :1500.0 Intensity: 10.0"
9[1] "M/z :1400.0 Intensity: 209.0"
10[1] "M/z :1300.0 Intensity: 408.0"
11[1] "M/z :1200.0 Intensity: 607.0"
12[1] "M/z :1100.0 Intensity: 806.0"
13[1] "M/z :1000.0 Intensity: 1005.0"
14[1] "M/z :900.0 Intensity: 1204.0"
15[1] "M/z :800.0 Intensity: 1403.0"
16[1] "M/z :700.0 Intensity: 1602.0"
17[1] "M/z :600.0 Intensity: 1801.0"
18[1] "M/z :500.0 Intensity: 2000.0"
or we can use a for-loop (note that we use zero-based indices as custom in Python):
1for (i in seq(0,py_to_r(spectrum$size())-1)) {
2 print(spectrum[i]$getMZ())
3 print(spectrum[i]$getIntensity())
4}