ProteaseDigestion#
- class pyopenms.ProteaseDigestion#
Bases:
object
Cython implementation of _ProteaseDigestion
- Original C++ documentation is available here
– Inherits from [‘EnzymaticDigestion’]
Class for the enzymatic digestion of proteins
Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.
Usage:
from pyopenms import * from urllib.request import urlretrieve # urlretrieve ("http://www.uniprot.org/uniprot/P02769.fasta", "bsa.fasta") # dig = ProteaseDigestion() dig.setEnzyme('Lys-C') bsa_string = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]]) bsa_oms_string = String(bsa_string) # convert python string to OpenMS::String for further processing # minlen = 6 maxlen = 30 # # Using AASequence and digest result_digest = [] result_digest_min_max = [] bsa_aaseq = AASequence.fromString(bsa_oms_string) dig.digest(bsa_aaseq, result_digest) dig.digest(bsa_aaseq, result_digest_min_max, minlen, maxlen) print(result_digest[4].toString()) # GLVLIAFSQYLQQCPFDEHVK print(len(result_digest)) # 57 peptides print(result_digest_min_max[4].toString()) # LVNELTEFAK print(len(result_digest_min_max)) # 42 peptides # # Using digestUnmodified without the need for AASequence from the EnzymaticDigestion base class result_digest_unmodified = [] dig.digestUnmodified(StringView(bsa_oms_string), result_digest_unmodified, minlen, maxlen) print(result_digest_unmodified[4].getString()) # LVNELTEFAK print(len(result_digest_unmodified)) # 42 peptides
- __init__()#
Overload:
- __init__(self) None
Overload:
- __init__(self, in_0: ProteaseDigestion) None
Methods
Overload:
countInternalCleavageSites
(self, sequence)Returns the number of internal cleavage sites for this sequence.
Overload:
digestUnmodified
(self, sequence, output, ...)Performs the enzymatic digestion of an unmodified sequence
getEnzymeName
(self)Returns the enzyme for the digestion
getMissedCleavages
(self)Returns the max.
getSpecificity
(self)Returns the specificity for the digestion
getSpecificityByName
(self, name)Returns the specificity by name.
Overload:
peptideCount
(self, protein)Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings
Overload:
setMissedCleavages
(self, missed_cleavages)Sets the max.
setSpecificity
(self, spec)Sets the specificity for the digestion (default is SPEC_FULL)
- countInternalCleavageSites(self, sequence: bytes | str | String) int #
Returns the number of internal cleavage sites for this sequence.
- digest()#
Overload:
- digest(self, protein: AASequence, output: List[AASequence]) int
Overload:
- digest(self, protein: AASequence, output: List[AASequence], min_length: int, max_length: int) int
Performs the enzymatic digestion of a protein.
- Parameters:
protein – Sequence to digest
output – Digestion products (peptides)
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)
- Returns:
Number of discarded digestion products (which are not matching length restrictions)
- digestUnmodified(self, sequence: StringView, output: List[StringView], min_length: int, max_length: int) int #
Performs the enzymatic digestion of an unmodified sequence
By returning only references into the original string this is very fast
- Parameters:
sequence – Sequence to digest
output – Digestion products
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)
- Returns:
Number of discarded digestion products (which are not matching length restrictions)
- getMissedCleavages(self) int #
Returns the max. number of allowed missed cleavages for the digestion
- getSpecificity(self) int #
Returns the specificity for the digestion
- getSpecificityByName(self, name: bytes | str | String) int #
Returns the specificity by name. Returns SPEC_UNKNOWN if name is not valid
- isValidProduct()#
Overload:
- isValidProduct(self, protein: AASequence, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool
Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage
Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here
- Parameters:
protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s
allow_nterm_protein_cleavage – Regard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with ‘M’
allow_random_asp_pro_cleavage – Allow cleavage at D|P sites to count as n/c-terminal
- Returns:
True if peptide has correct n/c terminals (according to enzyme, specificity and above flags)
Overload:
- isValidProduct(self, protein: bytes | str | String, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool
Forwards to isValidProduct using protein.toUnmodifiedString()
Overload:
- isValidProduct(self, sequence: bytes | str | String, pos: int, length: int, ignore_missed_cleavages: bool) bool
Boolean operator returns true if the peptide fragment starting at position pos with length length within the sequence sequence generated by the current enzyme
Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the MC flag provided here
- Parameters:
protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s
- Returns:
True if peptide has correct n/c terminals (according to enzyme, specificity and missed cleavages)
- peptideCount(self, protein: AASequence) int #
Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings
- setEnzyme()#
Overload:
- setEnzyme(self, name: bytes | str | String) None
Sets the enzyme for the digestion (by name)
Overload:
- setEnzyme(self, enzyme: DigestionEnzyme) None
Sets the enzyme for the digestion
- setMissedCleavages(self, missed_cleavages: int) None #
Sets the max. number of allowed missed cleavages for the digestion (default is 0). This setting is ignored when log model is used
- setSpecificity(self, spec: int) None #
Sets the specificity for the digestion (default is SPEC_FULL)