IDFilter#

class pyopenms.IDFilter#

Bases: object

Cython implementation of _IDFilter

Original C++ documentation is available here

Finds the best-scoring hit in a vector of peptide or protein identifications

This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context.

The filter functions modify their inputs, rather than creating filtered copies.

Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID.

The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, …). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary.

The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.

__init__()#

Overload:

__init__(self) None

Overload:

__init__(self, in_0: IDFilter) None

Methods

__init__

Overload:

countHits

Overload:

extractPeptideSequences(self, peptides, ...)

Extracts all unique peptide sequences from a list of peptide IDs

filterHitsByRank

Overload:

filterHitsByScore

Overload:

filterPeptidesByCharge(self, peptides, ...)

Filters peptide identifications according to charge state

filterPeptidesByLength(self, peptides, ...)

Filters peptide identifications according to peptide sequence length

filterPeptidesByMZ(self, peptides, min_mz, ...)

Filters peptide identifications by precursor m/z, keeping only IDs in the given range

filterPeptidesByMZError(self, peptides, ...)

Filter peptide identifications according to mass deviation

filterPeptidesByRT(self, peptides, min_rt, ...)

Filters peptide identifications by precursor RT, keeping only IDs in the given range

filterPeptidesByRTPredictPValue(self, ...)

Filters peptide identifications according to p-values from RTPredict

getBestHit

Overload:

keepBestPeptideHits(self, peptides, strict)

Filters peptide identifications keeping only the single best-scoring hit per ID

keepBestPerPeptide(self, peptides, ...)

Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence

keepBestPerPeptidePerRun(self, prot_ids, ...)

Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence on a per run basis

keepHitsMatchingProteins

Overload:

keepNBestHits

Overload:

keepNBestSpectra(self, peptides, n)

Filter identifications by "N best" PeptideIdentification objects (better PeptideIdentification means better [best] PeptideHit than other)

keepPeptidesWithMatchingModifications(self, ...)

Keeps only peptide hits that have at least one of the given modifications

keepPeptidesWithMatchingSequences(self, ...)

Removes all peptide hits with a sequence that does not match one in 'good_peptides'

keepUniquePeptidesPerProtein(self, peptides)

Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

removeDecoyHits

Overload:

removeDuplicatePeptideHits(self, peptides)

Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID)

removeEmptyIdentifications

Overload:

removeHitsMatchingProteins

Overload:

removePeptidesWithMatchingModifications(...)

Removes all peptide hits that have at least one of the given modifications

removePeptidesWithMatchingSequences(self, ...)

Removes all peptide hits with a sequence that matches one in 'bad_peptides'

removeUnreferencedProteins(self, proteins, ...)

Removes protein hits from the protein IDs in a 'cmap' that are not referenced by a peptide in the features or if requested in the unassigned peptide list

updateHitRanks

Overload:

updateProteinGroups(self, groups, hits)

Update protein groups after protein hits were filtered

updateProteinReferences(self, peptides, ...)

Removes references to missing proteins.

DigestionFilter#

alias of __DigestionFilter

countHits()#

Overload:

countHits(self, identifications: List[PeptideIdentification]) int

Returns the total number of peptide hits in a vector of peptide identifications

Overload:

countHits(self, identifications: List[ProteinIdentification]) int

Returns the total number of protein hits in a vector of protein identifications

extractPeptideSequences(self, peptides: List[PeptideIdentification], sequences: Set[bytes], ignore_mods: bool) None#

Extracts all unique peptide sequences from a list of peptide IDs

Parameters:
  • peptides

  • ignore_mods – Boolean operator default to false in case of any modifications in sequences during extraction

Returns:

Sequences

filterHitsByRank()#

Overload:

filterHitsByRank(self, ids: List[PeptideIdentification], min_rank: int, max_rank: int) None

Filters peptide or protein identifications according to the ranking of the hits

The hits between ‘min_rank’ and ‘max_rank’ (both inclusive) in each ID are kept Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1 The ranks are (re-)computed before filtering ‘max_rank’ is ignored if it is smaller than ‘min_rank’

Note: There may be several hits with the same rank in a peptide or protein ID (if the scores are the same). This method is useful if a range of higher hits is needed for decoy fairness analysis

Overload:

filterHitsByRank(self, ids: List[ProteinIdentification], min_rank: int, max_rank: int) None

Filters peptide or protein identifications according to the ranking of the hits

The hits between ‘min_rank’ and ‘max_rank’ (both inclusive) in each ID are kept Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1 The ranks are (re-)computed before filtering ‘max_rank’ is ignored if it is smaller than ‘min_rank’

Note: There may be several hits with the same rank in a peptide or protein ID (if the scores are the same). This method is useful if a range of higher hits is needed for decoy fairness analysis

filterHitsByScore()#

Overload:

filterHitsByScore(self, ids: List[PeptideIdentification], threshold_score: float) None

Filters peptide or protein identifications according to the score of the hits. The score orientation has to be set to higherscorebetter in each PeptideIdentification. Only peptide/protein hits with a score at least as good as ‘threshold_score’ are kept

Overload:

filterHitsByScore(self, ids: List[ProteinIdentification], threshold_score: float) None

Filters peptide or protein identifications according to the score of the hits. The score orientation has to be set to higherscorebetter in each PeptideIdentification/ProteinIdentifiation. Only peptide/protein hits with a score at least as good as ‘threshold_score’ are kept

Overload:

filterHitsByScore(self, experiment: MSExperiment, peptide_threshold_score: float, protein_threshold_score: float) None

Filters an MS/MS experiment according to score thresholds

filterPeptidesByCharge(self, peptides: List[PeptideIdentification], min_charge: int, max_charge: int) None#

Filters peptide identifications according to charge state

filterPeptidesByLength(self, peptides: List[PeptideIdentification], min_length: int, max_length: int) None#

Filters peptide identifications according to peptide sequence length

filterPeptidesByMZ(self, peptides: List[PeptideIdentification], min_mz: int, max_mz: int) None#

Filters peptide identifications by precursor m/z, keeping only IDs in the given range

filterPeptidesByMZError(self, peptides: List[PeptideIdentification], mass_error: float, unit_ppm: bool) None#

Filter peptide identifications according to mass deviation

filterPeptidesByRT(self, peptides: List[PeptideIdentification], min_rt: int, max_rt: int) None#

Filters peptide identifications by precursor RT, keeping only IDs in the given range

filterPeptidesByRTPredictPValue(self, peptides: List[PeptideIdentification], metavalue_key: bytes | str | String, threshold: float) None#

Filters peptide identifications according to p-values from RTPredict

Filters the peptide hits by the probability (p-value) of a correct peptide identification having a deviation between observed and predicted RT equal to or greater than allowed

Parameters:
  • peptides – Input/output

  • metavalue_key – Name of the meta value that holds the p-value: “predicted_RT_p_value” or “predicted_RT_p_value_first_dim”

  • threshold – P-value threshold

getBestHit()#

Overload:

getBestHit(self, identifications: List[PeptideIdentification], assume_sorted: bool, best_hit: PeptideHit) bool

Finds the best-scoring hit in a vector of peptide or protein identifications

If there are several hits with the best score, the first one is taken

Parameters:
  • identifications – Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits

  • assume_sorted – Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

  • best_hit – Contains the best hit if successful in a vector of peptide identifications

Returns:

true if a hit was present, false otherwise

Overload:

getBestHit(self, identifications: List[ProteinIdentification], assume_sorted: bool, best_hit: ProteinHit) bool

Finds the best-scoring hit in a vector of peptide or protein identifications

If there are several hits with the best score, the first one is taken

Parameters:
  • identifications – Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits

  • assume_sorted – Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

  • best_hit – Contains the best hit if successful in a vector of protein identifications

Returns:

true if a hit was present, false otherwise

keepBestPeptideHits(self, peptides: List[PeptideIdentification], strict: bool) None#

Filters peptide identifications keeping only the single best-scoring hit per ID

Parameters:
  • peptides – Input/output

  • strict – If set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.)

keepBestPerPeptide(self, peptides: List[PeptideIdentification], ignore_mods: bool, ignore_charges: bool, nr_best_spectrum: int) None#

Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence

keepBestPerPeptidePerRun(self, prot_ids: List[ProteinIdentification], peptides: List[PeptideIdentification], ignore_mods: bool, ignore_charges: bool, nr_best_spectrum: int) None#

Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence on a per run basis

keepHitsMatchingProteins()#

Overload:

keepHitsMatchingProteins(self, ids: List[PeptideIdentification], accessions: Set[bytes]) None

Filters peptide or protein identifications according to the given proteins (positive)

Overload:

keepHitsMatchingProteins(self, ids: List[ProteinIdentification], accessions: Set[bytes]) None

Filters peptide or protein identifications according to the given proteins (positive)

Overload:

keepHitsMatchingProteins(self, experiment: MSExperiment, proteins: List[FASTAEntry]) None
keepNBestHits()#

Overload:

keepNBestHits(self, ids: List[PeptideIdentification], n: int) None

Overload:

keepNBestHits(self, ids: List[ProteinIdentification], n: int) None

Overload:

keepNBestHits(self, experiment: MSExperiment, n: int) None

Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum

keepNBestSpectra(self, peptides: List[PeptideIdentification], n: int) None#

Filter identifications by “N best” PeptideIdentification objects (better PeptideIdentification means better [best] PeptideHit than other)

keepPeptidesWithMatchingModifications(self, peptides: List[PeptideIdentification], modifications: Set[bytes]) None#

Keeps only peptide hits that have at least one of the given modifications

keepPeptidesWithMatchingSequences(self, peptides: List[PeptideIdentification], bad_peptides: List[PeptideIdentification], ignore_mods: bool) None#

Removes all peptide hits with a sequence that does not match one in ‘good_peptides’

keepUniquePeptidesPerProtein(self, peptides: List[PeptideIdentification]) None#

Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

removeDecoyHits()#

Overload:

removeDecoyHits(self, ids: List[PeptideIdentification]) None

Removes hits annotated as decoys from peptide or protein identifications. Checks for meta values named “target_decoy” and “isDecoy”, and removes protein/peptide hits if the values are “decoy” and “true”, respectively

Overload:

removeDecoyHits(self, ids: List[ProteinIdentification]) None

Removes hits annotated as decoys from peptide or protein identifications. Checks for meta values named “target_decoy” and “isDecoy”, and removes protein/peptide hits if the values are “decoy” and “true”, respectively

removeDuplicatePeptideHits(self, peptides: List[PeptideIdentification]) None#

Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID)

removeEmptyIdentifications()#

Overload:

removeEmptyIdentifications(self, ids: List[PeptideIdentification]) None

Removes peptide or protein identifications that have no hits in them

Overload:

removeEmptyIdentifications(self, ids: List[ProteinIdentification]) None

Removes peptide or protein identifications that have no hits in them

removeHitsMatchingProteins()#

Overload:

removeHitsMatchingProteins(self, ids: List[PeptideIdentification], accessions: Set[bytes]) None

Filters peptide or protein identifications according to the given proteins (negative)

Overload:

removeHitsMatchingProteins(self, ids: List[ProteinIdentification], accessions: Set[bytes]) None

Filters peptide or protein identifications according to the given proteins (negative)

removePeptidesWithMatchingModifications(self, peptides: List[PeptideIdentification], modifications: Set[bytes]) None#

Removes all peptide hits that have at least one of the given modifications

removePeptidesWithMatchingSequences(self, peptides: List[PeptideIdentification], bad_peptides: List[PeptideIdentification], ignore_mods: bool) None#

Removes all peptide hits with a sequence that matches one in ‘bad_peptides’

removeUnreferencedProteins(self, proteins: List[ProteinIdentification], peptides: List[PeptideIdentification]) None#

Removes protein hits from the protein IDs in a ‘cmap’ that are not referenced by a peptide in the features or if requested in the unassigned peptide list

updateHitRanks()#

Overload:

updateHitRanks(self, identifications: List[PeptideIdentification]) None

Updates the hit ranks on all peptide or protein IDs

Overload:

updateHitRanks(self, identifications: List[ProteinIdentification]) None

Updates the hit ranks on all peptide or protein IDs

updateProteinGroups(self, groups: List[ProteinGroup], hits: List[ProteinHit]) bool#

Update protein groups after protein hits were filtered

Parameters:
  • groups – Input/output protein groups

  • hits – Available protein hits (all others are removed from the groups)

Returns:

Returns whether the groups are still valid (which is the case if only whole groups, if any, were removed)

updateProteinReferences(self, peptides: List[PeptideIdentification], proteins: List[ProteinIdentification], remove_peptides_without_reference: bool) None#

Removes references to missing proteins. Only PeptideEvidence entries that reference protein hits in ‘proteins’ are kept in the peptide hits