ProteaseDigestion#

class pyopenms.ProteaseDigestion#

Bases: object

Cython implementation of _ProteaseDigestion

Original C++ documentation is available here: – Inherits from [‘EnzymaticDigestion’]

Class for the enzymatic digestion of proteins

Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed. If specificity is set to semi-specific, digestion also returns semi-specific products, i.e. with only one end at actual cleavage sites.

Usage:

from pyopenms import *
from urllib.request import urlretrieve
#
urlretrieve ("http://www.uniprot.org/uniprot/P02769.fasta", "bsa.fasta")
#
dig = ProteaseDigestion()
dig.setEnzyme('Lys-C')
bsa_string = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]])
bsa_oms_string = String(bsa_string) # convert python string to OpenMS::String for further processing
#
minlen = 6
maxlen = 30
#
# Using AASequence and digest
result_digest = []
result_digest_min_max = []
bsa_aaseq = AASequence.fromString(bsa_oms_string)
dig.digest(bsa_aaseq, result_digest)
dig.digest(bsa_aaseq, result_digest_min_max, minlen, maxlen)
print(result_digest[4].toString()) # GLVLIAFSQYLQQCPFDEHVK
print(len(result_digest)) # 57 peptides
print(result_digest_min_max[4].toString()) # LVNELTEFAK
print(len(result_digest_min_max)) # 42 peptides
#
# Semi-specific digestion
result_semispecific = []
dig.setSpecificity(EnzymaticDigestion.SPEC_SEMI)
dig.digest(bsa_aaseq, result_semispecific)
#
# Using digestUnmodified without the need for AASequence from the EnzymaticDigestion base class
result_digest_unmodified = []
dig.digestUnmodified(StringView(bsa_oms_string), result_digest_unmodified, minlen, maxlen)
print(result_digest_unmodified[4].getString()) # LVNELTEFAK
print(len(result_digest_unmodified)) # 42 peptides

__init__()#

Overload:

__init__(self) → None

Overload:

__init__(self, in_0: ProteaseDigestion) → None

Methods

`__init__`
`countInternalCleavageSites`(self, sequence)	Returns the number of internal cleavage sites for this sequence.
`digest`
`digestUnmodified`(self, sequence, output, ...)	Performs the enzymatic digestion of an unmodified sequence
`getEnzymeName`(self)	Returns the enzyme for the digestion
`getMissedCleavages`(self)	Returns the max.
`getSpecificity`(self)	Returns the specificity for the digestion
`getSpecificityByName`(self, name)	Returns the specificity by name.
`isValidProduct`
`peptideCount`(self, protein)	Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings
`setEnzyme`
`setMissedCleavages`(self, missed_cleavages)	Sets the max.
`setSpecificity`(self, spec)	Sets the specificity for the digestion (default is SPEC_FULL)

countInternalCleavageSites(self, sequence: bytes | str | String) → int#: Returns the number of internal cleavage sites for this sequence.

digest()#

Overload:

digest(self, protein: AASequence, output: List[AASequence]) → int

Overload:

digest(self, protein: AASequence, output: List[AASequence], min_length: int, max_length: int) → int

Performs the enzymatic digestion of a protein.

Parameters:

protein – Sequence to digest
output – Digestion products (peptides)
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)

Returns:

Number of discarded digestion products (which are not matching length restrictions)

digestUnmodified(self, sequence: StringView, output: List[StringView], min_length: int, max_length: int) → int#

Performs the enzymatic digestion of an unmodified sequence

By returning only references into the original string this is very fast

Parameters:

sequence – Sequence to digest
output – Digestion products
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)

Returns:

Number of discarded digestion products (which are not matching length restrictions)

getEnzymeName(self) → bytes | str | String#: Returns the enzyme for the digestion

getMissedCleavages(self) → int#: Returns the max. number of allowed missed cleavages for the digestion

getSpecificity(self) → int#: Returns the specificity for the digestion

getSpecificityByName(self, name: bytes | str | String) → int#: Returns the specificity by name. Returns SPEC_UNKNOWN if name is not valid

isValidProduct()#

Overload:

isValidProduct(self, protein: AASequence, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) → bool

Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage

Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here

Parameters:

protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s
allow_nterm_protein_cleavage – Regard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with ‘M’
allow_random_asp_pro_cleavage – Allow cleavage at D|P sites to count as n/c-terminal

Returns:

True if peptide has correct n/c terminals (according to enzyme, specificity and above flags)

Overload:

isValidProduct(self, protein: bytes | str | String, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) → bool

Forwards to isValidProduct using protein.toUnmodifiedString()

Overload:

isValidProduct(self, sequence: bytes | str | String, pos: int, length: int, ignore_missed_cleavages: bool) → bool

Boolean operator returns true if the peptide fragment starting at position pos with length length within the sequence sequence generated by the current enzyme

Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the MC flag provided here

Parameters:

protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s

Returns:

True if peptide has correct n/c terminals (according to enzyme, specificity and missed cleavages)

peptideCount(self, protein: AASequence) → int#: Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings

setEnzyme()#

Overload:

setEnzyme(self, name: bytes | str | String) → None

Sets the enzyme for the digestion (by name)

Overload:

setEnzyme(self, enzyme: DigestionEnzyme) → None

Sets the enzyme for the digestion

setMissedCleavages(self, missed_cleavages: int) → None#: Sets the max. number of allowed missed cleavages for the digestion (default is 0). This setting is ignored when log model is used

setSpecificity(self, spec: int) → None#: Sets the specificity for the digestion (default is SPEC_FULL)