ProteaseDigestion#

class pyopenms.ProteaseDigestion#

Bases: object

Cython implementation of _ProteaseDigestion

Original C++ documentation is available here

– Inherits from [‘EnzymaticDigestion’]

Class for the enzymatic digestion of proteins

Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.

Usage:

from pyopenms import *
from urllib.request import urlretrieve
#
urlretrieve ("http://www.uniprot.org/uniprot/P02769.fasta", "bsa.fasta")
#
dig = ProteaseDigestion()
dig.setEnzyme('Lys-C')
bsa_string = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]])
bsa_oms_string = String(bsa_string) # convert python string to OpenMS::String for further processing
#
minlen = 6
maxlen = 30
#
# Using AASequence and digest
result_digest = []
result_digest_min_max = []
bsa_aaseq = AASequence.fromString(bsa_oms_string)
dig.digest(bsa_aaseq, result_digest)
dig.digest(bsa_aaseq, result_digest_min_max, minlen, maxlen)
print(result_digest[4].toString()) # GLVLIAFSQYLQQCPFDEHVK
print(len(result_digest)) # 57 peptides
print(result_digest_min_max[4].toString()) # LVNELTEFAK
print(len(result_digest_min_max)) # 42 peptides
#
# Using digestUnmodified without the need for AASequence from the EnzymaticDigestion base class
result_digest_unmodified = []
dig.digestUnmodified(StringView(bsa_oms_string), result_digest_unmodified, minlen, maxlen)
print(result_digest_unmodified[4].getString()) # LVNELTEFAK
print(len(result_digest_unmodified)) # 42 peptides
__init__()#

Overload:

__init__(self) None

Overload:

__init__(self, in_0: ProteaseDigestion) None

Methods

__init__

Overload:

countInternalCleavageSites(self, sequence)

Returns the number of internal cleavage sites for this sequence.

digest

Overload:

digestUnmodified(self, sequence, output, ...)

Performs the enzymatic digestion of an unmodified sequence

getEnzymeName(self)

Returns the enzyme for the digestion

getMissedCleavages(self)

Returns the max.

getSpecificity(self)

Returns the specificity for the digestion

getSpecificityByName(self, name)

Returns the specificity by name.

isValidProduct

Overload:

peptideCount(self, protein)

Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings

setEnzyme

Overload:

setMissedCleavages(self, missed_cleavages)

Sets the max.

setSpecificity(self, spec)

Sets the specificity for the digestion (default is SPEC_FULL)

countInternalCleavageSites(self, sequence: bytes | str | String) int#

Returns the number of internal cleavage sites for this sequence.

digest()#

Overload:

digest(self, protein: AASequence, output: List[AASequence]) int

Overload:

digest(self, protein: AASequence, output: List[AASequence], min_length: int, max_length: int) int

Performs the enzymatic digestion of a protein.

Parameters:
  • protein – Sequence to digest

  • output – Digestion products (peptides)

  • min_length – Minimal length of reported products

  • max_length – Maximal length of reported products (0 = no restriction)

Returns:

Number of discarded digestion products (which are not matching length restrictions)

digestUnmodified(self, sequence: StringView, output: List[StringView], min_length: int, max_length: int) int#

Performs the enzymatic digestion of an unmodified sequence

By returning only references into the original string this is very fast

Parameters:
  • sequence – Sequence to digest

  • output – Digestion products

  • min_length – Minimal length of reported products

  • max_length – Maximal length of reported products (0 = no restriction)

Returns:

Number of discarded digestion products (which are not matching length restrictions)

getEnzymeName(self) bytes | str | String#

Returns the enzyme for the digestion

getMissedCleavages(self) int#

Returns the max. number of allowed missed cleavages for the digestion

getSpecificity(self) int#

Returns the specificity for the digestion

getSpecificityByName(self, name: bytes | str | String) int#

Returns the specificity by name. Returns SPEC_UNKNOWN if name is not valid

isValidProduct()#

Overload:

isValidProduct(self, protein: AASequence, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool

Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage

Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here

Parameters:
  • protein – Protein sequence

  • pep_pos – Starting index of potential peptide

  • pep_length – Length of potential peptide

  • ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s

  • allow_nterm_protein_cleavage – Regard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with ‘M’

  • allow_random_asp_pro_cleavage – Allow cleavage at D|P sites to count as n/c-terminal

Returns:

True if peptide has correct n/c terminals (according to enzyme, specificity and above flags)

Overload:

isValidProduct(self, protein: bytes | str | String, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool

Forwards to isValidProduct using protein.toUnmodifiedString()

Overload:

isValidProduct(self, sequence: bytes | str | String, pos: int, length: int, ignore_missed_cleavages: bool) bool

Boolean operator returns true if the peptide fragment starting at position pos with length length within the sequence sequence generated by the current enzyme

Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the MC flag provided here

Parameters:
  • protein – Protein sequence

  • pep_pos – Starting index of potential peptide

  • pep_length – Length of potential peptide

  • ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s

Returns:

True if peptide has correct n/c terminals (according to enzyme, specificity and missed cleavages)

peptideCount(self, protein: AASequence) int#

Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings

setEnzyme()#

Overload:

setEnzyme(self, name: bytes | str | String) None

Sets the enzyme for the digestion (by name)

Overload:

setEnzyme(self, enzyme: DigestionEnzyme) None

Sets the enzyme for the digestion

setMissedCleavages(self, missed_cleavages: int) None#

Sets the max. number of allowed missed cleavages for the digestion (default is 0). This setting is ignored when log model is used

setSpecificity(self, spec: int) None#

Sets the specificity for the digestion (default is SPEC_FULL)