Query MSExperiment with MassQL#
MassQL is a powerful, SQL-like query language for mass spectrometry data. For further information visit the MassQL documentation.
MS data from a MSExperiment
can be exported to MS1 and MS2 dataframes, which can
be queried directly with the massql
module.
- pyopenms.MSExperiment.get_massql_df()
Exports data from
MSExperiment
to pandas DataFrames to be used with MassQL.Both dataframes contain the columns: ‘i’: intensity of a peak ‘i_norm’: intensity normalized by the maximun intensity in the spectrum ‘i_tic_norm’: intensity normalized by the sum of intensities (TIC) in the spectrum ‘mz’: mass to charge of a peak ‘scan’: number of the spectrum ‘rt’: retention time of the spectrum ‘polarity’: ion mode of the spectrum as integer value (positive: 1, negative: 2)
The MS2 dataframe contains additional columns: ‘precmz’: mass to charge of the precursor ion ‘ms1scan’: number of the corresponding MS1 spectrum ‘charge’: charge of the precursor ion
Returns:
ms1_df : pandas.DataFrame
peak data of MS1 spectra
ms2_df : pandas.DataFrame
peak data of MS2 spectra with precursor information
Example:
Load an example file into a MSExperiment
and get the MS1 and MS2 data frames for a MassQL query.
1import pyopenms as oms
2from massql import msql_engine
3
4from urllib.request import urlretrieve
5
6url = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/"
7
8urlretrieve(url + "small.mzML", "small.mzML")
9
10# load MSExperiment
11exp = oms.MSExperiment()
12oms.MzMLFile().load("small.mzML", exp)
13
14# get MS1 and MS2 dataframes
15ms1_df, ms2_df = exp.get_massql_df()
16
17ms1_df.head()
i |
i_norm |
i_tic_norm |
mz |
scan |
rt |
polarity |
|
---|---|---|---|---|---|---|---|
0 |
2105.75 |
0.00455405 |
0.000325626 |
360.696 |
1 |
15.0015 |
1 |
1 |
1172.47 |
0.00253567 |
0.000181306 |
361.2 |
1 |
15.0015 |
1 |
2 |
2287.57 |
0.00494729 |
0.000353743 |
361.208 |
1 |
15.0015 |
1 |
3 |
1547.15 |
0.00334599 |
0.000239246 |
361.621 |
1 |
15.0015 |
1 |
4 |
1842.32 |
0.00398435 |
0.00028489 |
362.698 |
1 |
15.0015 |
1 |
Run a query on ms1_df
and ms2_df
. If you don’t pass the data frames massql_engine.process_query
will read data from the given file name.
1# Executing Query
2results_df = msql_engine.process_query(
3 "QUERY scaninfo(MS1DATA) WHERE RTMIN=16",
4 "small.mzML",
5 ms1_df=ms1_df,
6 ms2_df=ms2_df,
7)
8
9results_df.head()
scan |
rt |
mslevel |
i |
i_norm |
|
---|---|---|---|---|---|
0 |
139 |
16.001 |
1 |
6.77786e+06 |
1 |
1 |
140 |
16.0095 |
1 |
9.65984e+06 |
1 |
2 |
141 |
16.0185 |
1 |
7.0933e+06 |
1 |
3 |
143 |
16.0268 |
1 |
7.51255e+06 |
1 |
4 |
144 |
16.0354 |
1 |
1.01007e+07 |
1 |
In the resulting data frame each row represents a scan with the peak intensities summed up.