PeptideProteinResolution#
- class pyopenms.PeptideProteinResolution#
Bases:
object
Cython implementation of _PeptideProteinResolution
Original C++ documentation is available here
Resolves shared peptides based on protein scores
Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current @em best @em indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the protein with larger number of peptides
The class could provide iterator for ConnectedComponents in the future. One could extend the graph to include all PeptideHits (not only the best). It becomes a tripartite graph with larger connected components then. Maybe extend it to work with MS1 features. Separate resolution and adding groups to output
- __init__()#
Overload:
- __init__(self, statistics: bool) None
Overload:
- __init__(self, in_0: PeptideProteinResolution) None
Methods
Overload:
buildGraph
(self, protein, peptides)Initialize and store the graph (= maps), needs sorted groups for correct functionality.
findConnectedComponent
(self, root_prot_grp)Does a BFS on the two maps (= two parts of the graph; indist.
resolveConnectedComponent
(self, conn_comp, ...)Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML.
resolveGraph
(self, protein, peptides)Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified.
__static_PeptideProteinResolution_run(proteins: List[ProteinIdentification] , peptides: List[PeptideIdentification] ) -> None
- buildGraph(self, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None #
Initialize and store the graph (= maps), needs sorted groups for correct functionality. Therefore sorts the indist. protein groups if not skipped
- Parameters:
protein – ProteinIdentification object storing IDs and groups
peptides – Vector of ProteinIdentifications with links to the proteins
skip_sort – Skips sorting of groups, nothing is modified then
- findConnectedComponent(self, root_prot_grp: int) PeptideProteinResolution_ConnectedComponent #
Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step
- Parameters:
root_prot_grp – Starts the BFS at this protein group index
- Returns:
Returns a Connected Component as set of group and peptide indices
- resolveConnectedComponent(self, conn_comp: PeptideProteinResolution_ConnectedComponent, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None #
Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties resolved by taking protein with largest number of peptides
- Parameters:
conn_comp – The component to be resolved
protein – ProteinIdentification object storing IDs and groups
peptides – Vector of ProteinIdentifications with links to the proteins
- resolveGraph(self, protein: ProteinIdentification, peptides: List[PeptideIdentification]) None #
Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method
- Parameters:
protein – ProteinIdentification object storing IDs and groups
peptides – vector of ProteinIdentifications with links to the proteins
- run()#
__static_PeptideProteinResolution_run(proteins: List[ProteinIdentification] , peptides: List[PeptideIdentification] ) -> None