PeptideProteinResolution#

class pyopenms.PeptideProteinResolution#

Bases: object

Cython implementation of _PeptideProteinResolution

Original C++ documentation is available here

Resolves shared peptides based on protein scores

Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current @em best @em indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the protein with larger number of peptides

The class could provide iterator for ConnectedComponents in the future. One could extend the graph to include all PeptideHits (not only the best). It becomes a tripartite graph with larger connected components then. Maybe extend it to work with MS1 features. Separate resolution and adding groups to output

__init__()#

Overload:

__init__(self, statistics: bool) None

Overload:

__init__(self, in_0: PeptideProteinResolution) None

Methods

__init__

Overload:

buildGraph(self, protein, peptides)

Initialize and store the graph (= maps), needs sorted groups for correct functionality.

findConnectedComponent(self, root_prot_grp)

Does a BFS on the two maps (= two parts of the graph; indist.

resolveConnectedComponent(self, conn_comp, ...)

Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML.

resolveGraph(self, protein, peptides)

Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified.

run

__static_PeptideProteinResolution_run(proteins: List[ProteinIdentification] , peptides: List[PeptideIdentification] ) -> None

buildGraph(self, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None#

Initialize and store the graph (= maps), needs sorted groups for correct functionality. Therefore sorts the indist. protein groups if not skipped

Parameters:
  • protein – ProteinIdentification object storing IDs and groups

  • peptides – Vector of ProteinIdentifications with links to the proteins

  • skip_sort – Skips sorting of groups, nothing is modified then

findConnectedComponent(self, root_prot_grp: int) PeptideProteinResolution_ConnectedComponent#

Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step

Parameters:

root_prot_grp – Starts the BFS at this protein group index

Returns:

Returns a Connected Component as set of group and peptide indices

resolveConnectedComponent(self, conn_comp: PeptideProteinResolution_ConnectedComponent, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None#

Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties resolved by taking protein with largest number of peptides

Parameters:
  • conn_comp – The component to be resolved

  • protein – ProteinIdentification object storing IDs and groups

  • peptides – Vector of ProteinIdentifications with links to the proteins

resolveGraph(self, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None#

Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method

Parameters:
  • protein – ProteinIdentification object storing IDs and groups

  • peptides – vector of ProteinIdentifications with links to the proteins

run()#

__static_PeptideProteinResolution_run(proteins: List[ProteinIdentification] , peptides: List[PeptideIdentification] ) -> None