Memory Management#

On order to save memory, we can avoid loading the whole file into memory and use the OnDiscMSExperiment for reading data.

 1import pyopenms as oms
 2
 3od_exp = oms.OnDiscMSExperiment()
 4od_exp.openFile("test.mzML")
 5
 6e = oms.MSExperiment()
 7for k in range(od_exp.getNrSpectra()):
 8    s = od_exp.getSpectrum(k)
 9    if s.getNativeID().startswith("scan="):
10        e.addSpectrum(s)
11
12oms.MzMLFile().store("test_filtered.mzML", e)

Note that using the approach the output data e is still completely in memory and may end up using a substantial amount of memory. We can avoid that by using

 1od_exp = oms.OnDiscMSExperiment()
 2od_exp.openFile("test.mzML")
 3
 4consumer = oms.PlainMSDataWritingConsumer("test_filtered.mzML")
 5
 6e = oms.MSExperiment()
 7for k in range(od_exp.getNrSpectra()):
 8    s = od_exp.getSpectrum(k)
 9    if s.getNativeID().startswith("scan="):
10        consumer.consumeSpectrum(s)
11
12del consumer

Make sure you do not forget del consumer since otherwise the final part of the mzML may not get written to disk (and the consumer is still waiting for new data).