Digestion ========= Proteolytic Digestion with Trypsin ********************************** OpenMS has classes for proteolytic digestion which can be used as follows: .. code-block:: python import pyopenms as oms from urllib.request import urlretrieve gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve(gh + "/src/data/P02769.fasta", "bsa.fasta") dig = oms.ProteaseDigestion() dig.getEnzymeName() # Trypsin bsa = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]]) bsa = oms.AASequence.fromString(bsa) # create all digestion products result = [] dig.digest(bsa, result) print(result[4].toString()) len(result) # 82 peptides Very short peptides or even single amino acid digestion products are often discarded as they usually contain little information (e.g., can't be used to identify proteins). We now only generate digestion products with a length of :math:`7` to :math:`40`. .. code-block:: python # only create peptides of length 7-40 dig.digest(bsa, result, 7, 40) # print the results for s in result: print(s.toString()) Enzymatic digestion is often not perfect and sometimes enzymes miss cutting a peptide. We now allow up to two missed cleavages. .. code-block:: python # Allow two missed cleavages dig.setMissedCleavages(2) # only create peptides of length 7-40 dig.digest(bsa, result, 7, 40) # print the results for s in result: print(s.toString()) Proteolytic Digestion with Lys-C ******************************** We can of course also use different enzymes, these are defined in the ``Enzymes.xml`` file and can be accessed using the :py:class:`~.EnzymesDB` object .. code-block:: python names = [] oms.ProteaseDB().getAllNames(names) len(names) # at least 25 by default e = oms.ProteaseDB().getEnzyme("Lys-C") e.getRegExDescription() e.getRegEx() Now that we have learned about the other enzymes available, we can use it to cut our protein of interest: .. code-block:: python from urllib.request import urlretrieve gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master" urlretrieve(gh + "/src/data/P02769.fasta", "bsa.fasta") dig = oms.ProteaseDigestion() dig.setEnzyme("Lys-C") bsa = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]]) bsa = oms.AASequence.fromString(bsa) result = [] dig.digest(bsa, result) print(result[4].toString()) len(result) # 57 peptides We now get different digested peptides (:math:`57` vs :math:`82`) and the fourth peptide is now ``GLVLIAFSQYLQQCPFDEHVK`` instead of ``DTHK`` as with Trypsin (see above). Oligonucleotide Digestion ************************** There are multiple cleavage enzymes available for oligonucleotides, these are defined ``Enzymes_RNA.xml`` file and can be accessed using the :py:class:`~.RNaseDB` object .. code-block:: python db = oms.RNaseDB() names = [] db.getAllNames(names) names # Will print out all available enzymes: # ['RNase_U2', 'RNase_T1', 'RNase_H', 'unspecific cleavage', 'no cleavage', 'RNase_MC1', 'RNase_A', 'cusativin'] e = db.getEnzyme("RNase_T1") e.getRegEx() e.getThreePrimeGain() We can now use it to cut an oligo: .. code-block:: python oligo = oms.NASequence.fromString("pAUGUCGCAG") dig = oms.RNaseDigestion() dig.setEnzyme("RNase_T1") result = [] dig.digest(oligo, result) for fragment in result: print(fragment) print("Looking closer at", result[0]) print(" Five Prime modification:", result[0].getFivePrimeMod().getCode()) print(" Three Prime modification:", result[0].getThreePrimeMod().getCode()) for ribo in result[0]: print(ribo.getCode(), ribo.getMonoMass(), ribo.isModified())