2. Table of contents
• Databases
– Data collection
– PPI databases
– Issues
– Utility of bioinformatics
– Standards
• PSI
– PSI-MI format
• PSI-MITAB
• PSI-MI XML
• Tools
– PSI-MI ontology
– MIMIx
– Data submission tools
3. 23.08.2018 3
DB
GUI
API
WS
A AA A
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
A AA A
A Annotator Database
Graphical User Interface
Application programming interface
Web Services
GUI
API
WS
User
Data collection
Ideally Reality
5. Issues
Many data sources
• Maintain and update
• New appearing
• Many vanishing*
Different query interfaces
data integration?
Variable results
• Syntax
• Semantics
• Minimum information
* Merali Z. et all. Databases in peril. Nature 2005.
Where to find them?
Redundant data?
6. 23.08.2018 6
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Tim Hubbard
7. 23.08.2018 7
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
8. Standards
• Community agreed specification for how data
types should be represented and described.
• Standards facilitates:
– Portability
– Sharing
– Integration
– Interoperability
– Reusability
9. Standards
• Standards to consider in bioinformatics:
• Formats
• Schemas
• Minimum information guidelines
• Controlled vocabularies
• Identifiers
• Query interfaces
11. 11
PSI-MI
Data format
Data distribution
Control vocabulary
Data submission
Standard format
Tools
PSICQUIC
PSI-MI CV
Reporting guideline MIMIx
Tools
PSI-MI XML
PSI-MITAB
XML Java API
MITAB Java API
XMLMakerFlattener
Semantic Validator
RPsiXML (Bioconductor)
PSI-MI XML files
PSI Excel Sheet
PSI Web Form
Servers
Registry
Clients
PSISCORE
Servers
Registry
Clients
12. • Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
– … facilitating data comparison, exchange and verification
PSI
12
http://www.psidev.info/
13. • Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
– … facilitating data comparison, exchange and verification
PSI
13
• MIAPE: The Minimum Information About a Proteomics Experiment
• Data and metadata from proteomics experiments
• Data: results
• Metadata: data about the data
• Where the samples came from
• How the analysis were performed
http://www.psidev.info/
14. • Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of data in
public repositories
• Concentrating on …
– Annotation and representation of published MI data
– Accessibility of MI data to the user community
PSI-MI (Molecular Interactions)
Data format
Data distribution
Control vocabulary
MIAPE
Reporting guideline
PSI-MI XML
PSI-MITAB
PSICQUIC
MIMIxPSI-MI CV
http://www.psidev.info/MI
Scoring
PSISCORE
15. PSI-MI format
• Community standard for Molecular Interactions
• Jointly developed by major data providers: BIND,
CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS,
Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Collecting and combining data from different sources
has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained,
e.g. IntAct data in Cytoscape.
15
psi-mi/xml25 psi-mi/tab25
16. PSI-MITAB
• Aimed at users that are more comfortable with Excel
• Only provides binary interactions
psi-mi/tab25
17. Standard columns (15):
• ID(s) interactor A & B
• Alt. ID(s) interactor A & B
• Alias(es) interactor A & B
• Interaction detection method(s)
• Publication 1st author(s)
• Publication Identifier(s)
• Taxid interactor A & B
• Interaction type(s)
• Source database(s)
• Interaction identifier(s)
• Confidence value(s)
Standard columns (21):
• Complex expansion
• Biological role A & B
• Experimental role A & B
• Interactor type A & B
• Xrefs A, B & Int.
• Annotations A, B & Int.
• Host organism
• Parameters Int.
• Created
• Updated
• CheckSum A, B & Int.
• Negative
Standard columns (4):
• Binding feature A & B
• Stoichiometry A & B
v2.5 v2.6 v2.7
15 36 40
PSI-MITAB
18. PSI-MI format: Tools
• XML Java API (PSI-MI XML 2.5 Java Parser)
– Parse “PSI-MI XML”
– Create “PSI-MI XML”
• MITAB Java API (PSI-MITAB 2.5 Java Parser)
– Parse “PSI-MITAB”
– Create “PSI-MITAB”
• XMLMakerFlattener
– “PSI MI XML” to “Tab-delimited format”
– “Tab-delimited format” to “PSI MI XML”
• XML Validator
– Semantic and syntactic consistency
• XML transformation:
– MIF25_view.xsl “XML” to “HTML”
– MIF25_compact.xsl PSI-MI XML “expanded” to “compact”
– MIF25_expand.xsl PSI-MI XML “compact” to “expanded”
18
19. 19
• Why do we use them ?
e.g. more than 20 ways to write:
yeast two hybrid, Y2H, 2H, two-hybrid, …
• Intact use PSI-MI ontology
• Over 1,500 terms, fully defined and cross-referenced
Control vocabulary: PSI-MI ontology