6. Data integration problems
Many data resources
• Many to maintain
• New appearing
• Just 20% has a sustained future*
• Not easy to find them
Different query interfaces
data integration?
Variable results
• Formats
• Schemas
• Controlled vocabularies
• Minimum information guidelines
* Merali Z. et all. Databases in peril. Nature 2005.
• Redundancy
• Inconsistency
7. Access, exchange, sharing,
portability, interoperability,
annotation, comparison,
verification, representation,
integration, reusability.
Nucleotide sequences
INSDC
EMBL
DDBJ
NCBI
Molecular interactions
IMEx
IntAct
InnateDB
DIP
MINT
…
Collaboration among data providers
• More data coverage
• Less redundancy
• Less inconsistency
• Better data management
Protein indentifications
ProteomeXchange
PRIDE
PeptideAtlas
GPMDB
Tranche
…
8. Standards
• Common identifiers
• Controlled vocabularies
• Common formats
• Common schemas
• Minimum information guidelines
• Common query interfaces
Schema
Data
distribution
Reporting
guideline
Control
vocabulary
Format Identifiers
9. • Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of
Molecular Interaction data in public repositories
• Concentrating on …
• Annotation and representation of published MI data
• Accessibility of MI data to the user community
PSI-MI
Data format/schema
Data distribution
Control vocabulary
MIAPE
Reporting guideline
PSI-MI XML
PSI-MITAB
PSICQUIC
MIMIx
IMEx
PSI-MI CV
http://www.psidev.info/MI
Scoring
PSISCORE
10. PSIQUIC Proteomics Standard Initiative Common QUery InterfaCe
13/12/2018
10
PSICQUIC
Query Interactions
PSICQUIC
Registry
PSI-MIMIQL
Input Output
PSICQUIC
Service A
PSICQUIC
Service B
PSICQUIC
Service C
User View
Web Service System
application