SlideShare a Scribd company logo
1 of 24
PSIMEx Workshop
Interactions and Pathways
6-7 October 2011
Rafael Jimenez
rafael@ebi.ac.uk
Data formats and ontologies
Table of contents
• Databases
– Data collection
– PPI databases
– Issues
– Utility of bioinformatics
– Standards
• PSI
– PSI-MI format
• PSI-MITAB
• PSI-MI XML
• Tools
– PSI-MI ontology
– MIMIx
– Data submission tools
23.08.2018 3
DB
GUI
API
WS
A AA A
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
A AA A
A Annotator Database
Graphical User Interface
Application programming interface
Web Services
GUI
API
WS
User
Data collection
Ideally Reality
Protein-protein interaction databases
http://www.pathguide.org
132 databases!
Issues
Many data sources
• Maintain and update
• New appearing
• Many vanishing*
Different query interfaces
data integration?
Variable results
• Syntax
• Semantics
• Minimum information
* Merali Z. et all. Databases in peril. Nature 2005.
Where to find them?
Redundant data?
23.08.2018 6
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Tim Hubbard
23.08.2018 7
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Standards
• Community agreed specification for how data
types should be represented and described.
• Standards facilitates:
– Portability
– Sharing
– Integration
– Interoperability
– Reusability
Standards
• Standards to consider in bioinformatics:
• Formats
• Schemas
• Minimum information guidelines
• Controlled vocabularies
• Identifiers
• Query interfaces
Standards
http://biosharing.org
11
PSI-MI
Data format
Data distribution
Control vocabulary
Data submission
Standard format
Tools
PSICQUIC
PSI-MI CV
Reporting guideline MIMIx
Tools
PSI-MI XML
PSI-MITAB
XML Java API
MITAB Java API
XMLMakerFlattener
Semantic Validator
RPsiXML (Bioconductor)
PSI-MI XML files
PSI Excel Sheet
PSI Web Form
Servers
Registry
Clients
PSISCORE
Servers
Registry
Clients
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
– … facilitating data comparison, exchange and verification
PSI
12
http://www.psidev.info/
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics
– … facilitating data comparison, exchange and verification
PSI
13
• MIAPE: The Minimum Information About a Proteomics Experiment
• Data and metadata from proteomics experiments
• Data: results
• Metadata: data about the data
• Where the samples came from
• How the analysis were performed
http://www.psidev.info/
• Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of data in
public repositories
• Concentrating on …
– Annotation and representation of published MI data
– Accessibility of MI data to the user community
PSI-MI (Molecular Interactions)
Data format
Data distribution
Control vocabulary
MIAPE
Reporting guideline
PSI-MI XML
PSI-MITAB
PSICQUIC
MIMIxPSI-MI CV
http://www.psidev.info/MI
Scoring
PSISCORE
PSI-MI format
• Community standard for Molecular Interactions
• Jointly developed by major data providers: BIND,
CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS,
Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Collecting and combining data from different sources
has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained,
e.g. IntAct data in Cytoscape.
15
psi-mi/xml25 psi-mi/tab25
PSI-MITAB
• Aimed at users that are more comfortable with Excel
• Only provides binary interactions
psi-mi/tab25
Standard columns (15):
• ID(s) interactor A & B
• Alt. ID(s) interactor A & B
• Alias(es) interactor A & B
• Interaction detection method(s)
• Publication 1st author(s)
• Publication Identifier(s)
• Taxid interactor A & B
• Interaction type(s)
• Source database(s)
• Interaction identifier(s)
• Confidence value(s)
Standard columns (21):
• Complex expansion
• Biological role A & B
• Experimental role A & B
• Interactor type A & B
• Xrefs A, B & Int.
• Annotations A, B & Int.
• Host organism
• Parameters Int.
• Created
• Updated
• CheckSum A, B & Int.
• Negative
Standard columns (4):
• Binding feature A & B
• Stoichiometry A & B
v2.5 v2.6 v2.7
15 36 40
PSI-MITAB
PSI-MI format: Tools
• XML Java API (PSI-MI XML 2.5 Java Parser)
– Parse “PSI-MI XML”
– Create “PSI-MI XML”
• MITAB Java API (PSI-MITAB 2.5 Java Parser)
– Parse “PSI-MITAB”
– Create “PSI-MITAB”
• XMLMakerFlattener
– “PSI MI XML” to “Tab-delimited format”
– “Tab-delimited format” to “PSI MI XML”
• XML Validator
– Semantic and syntactic consistency
• XML transformation:
– MIF25_view.xsl “XML” to “HTML”
– MIF25_compact.xsl PSI-MI XML “expanded” to “compact”
– MIF25_expand.xsl PSI-MI XML “compact” to “expanded”
18
19
• Why do we use them ?
e.g. more than 20 ways to write:
yeast two hybrid, Y2H, 2H, two-hybrid, …
• Intact use PSI-MI ontology
• Over 1,500 terms, fully defined and cross-referenced
Control vocabulary: PSI-MI ontology
Control vocabulary: PSI-MI ontology
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
MIMIx
• MIAPE document guideline for molecular interactions
• 1. Manuscript information
• 2. Experiment
• 3. Interaction
• 4. Confidence
Minimum information guidelines
Data submission tools
PSI-MI XML files
PSI Excel Sheet
PSI Web Form
Thank you!
Questions?
ProteomicsServicesTeam

More Related Content

What's hot

PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of dataEOSC-hub project
 
PID Services for FAIR data
PID Services for FAIR dataPID Services for FAIR data
PID Services for FAIR dataOpenAIRE
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...Open Science Fair
 
What do MARC, RDF, and OWL have in common?
What do MARC, RDF, and OWL have in common?What do MARC, RDF, and OWL have in common?
What do MARC, RDF, and OWL have in common?Violeta Ilik
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data MiningCrossref
 
Karma Data Modeling
Karma Data ModelingKarma Data Modeling
Karma Data ModelingVioleta Ilik
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solutionPunk Milton
 
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer DataWeb-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer DataDerek Wright
 
Summary of Trends in Cataloging
Summary of Trends in CatalogingSummary of Trends in Cataloging
Summary of Trends in CatalogingWilliam Worford
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyLeila Zemmouchi-Ghomari
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeilldatascienceiqss
 
Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Violeta Ilik
 
Modeling Data with Karma – Data Integration Tool
Modeling Data with Karma – Data Integration ToolModeling Data with Karma – Data Integration Tool
Modeling Data with Karma – Data Integration ToolVioleta Ilik
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018Susanna-Assunta Sansone
 
Introduction to DataCite - Martin Fenner
Introduction to DataCite - Martin FennerIntroduction to DataCite - Martin Fenner
Introduction to DataCite - Martin FennerCrossref
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013Susanna-Assunta Sansone
 

What's hot (20)

PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of data
 
PID Services for FAIR data
PID Services for FAIR dataPID Services for FAIR data
PID Services for FAIR data
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
 
What do MARC, RDF, and OWL have in common?
What do MARC, RDF, and OWL have in common?What do MARC, RDF, and OWL have in common?
What do MARC, RDF, and OWL have in common?
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Karma Data Modeling
Karma Data ModelingKarma Data Modeling
Karma Data Modeling
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solution
 
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer DataWeb-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
 
Summary of Trends in Cataloging
Summary of Trends in CatalogingSummary of Trends in Cataloging
Summary of Trends in Cataloging
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
 
MetaData, MetaKoha
MetaData, MetaKohaMetaData, MetaKoha
MetaData, MetaKoha
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeill
 
Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...
 
Modeling Data with Karma – Data Integration Tool
Modeling Data with Karma – Data Integration ToolModeling Data with Karma – Data Integration Tool
Modeling Data with Karma – Data Integration Tool
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018FAIRsharing and FAIRmetrics - RDA, March 2018
FAIRsharing and FAIRmetrics - RDA, March 2018
 
Introduction to DataCite - Martin Fenner
Introduction to DataCite - Martin FennerIntroduction to DataCite - Martin Fenner
Introduction to DataCite - Martin Fenner
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013
 

Similar to Data formats and ontologies

PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.Rafael C. Jimenez
 
IntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICIntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICRafael C. Jimenez
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICRafael C. Jimenez
 
EMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesEMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesRafael C. Jimenez
 
Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Rafael C. Jimenez
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biologyNeil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biologyNeil Swainston
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationMaori Ito
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksNational Institute of Informatics
 

Similar to Data formats and ontologies (20)

PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
 
IntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUICIntAct and data distribution with PSICQUIC
IntAct and data distribution with PSICQUIC
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
EMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and servicesEMBL-EBI Proteomics data resources and services
EMBL-EBI Proteomics data resources and services
 
Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.Molecular interactions. PSICQUIC and IntAct.
Molecular interactions. PSICQUIC and IntAct.
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Data integration
Data integrationData integration
Data integration
 
Data integration
Data integrationData integration
Data integration
 
Psicquic
PsicquicPsicquic
Psicquic
 
Data integration
Data integrationData integration
Data integration
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data Integration
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 

More from Rafael C. Jimenez

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop Rafael C. Jimenez
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesRafael C. Jimenez
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsRafael C. Jimenez
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...Rafael C. Jimenez
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresRafael C. Jimenez
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic accessRafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeRafael C. Jimenez
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Rafael C. Jimenez
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Rafael C. Jimenez
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesRafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information Rafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Standards
StandardsStandards
Standards
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 

Recently uploaded

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 

Recently uploaded (20)

Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 

Data formats and ontologies

  • 1. PSIMEx Workshop Interactions and Pathways 6-7 October 2011 Rafael Jimenez rafael@ebi.ac.uk Data formats and ontologies
  • 2. Table of contents • Databases – Data collection – PPI databases – Issues – Utility of bioinformatics – Standards • PSI – PSI-MI format • PSI-MITAB • PSI-MI XML • Tools – PSI-MI ontology – MIMIx – Data submission tools
  • 3. 23.08.2018 3 DB GUI API WS A AA A DB GUI API WS DB GUI API WS DB GUI API WS DB GUI API WS A AA A A Annotator Database Graphical User Interface Application programming interface Web Services GUI API WS User Data collection Ideally Reality
  • 5. Issues Many data sources • Maintain and update • New appearing • Many vanishing* Different query interfaces data integration? Variable results • Syntax • Semantics • Minimum information * Merali Z. et all. Databases in peril. Nature 2005. Where to find them? Redundant data?
  • 6. 23.08.2018 6 Utility of bioinformaticsScientificimpact Too little bioinformatics Too many databases Too diverse interfaces Tim Hubbard
  • 7. 23.08.2018 7 Utility of bioinformaticsScientificimpact Too little bioinformatics Too many databases Too diverse interfaces
  • 8. Standards • Community agreed specification for how data types should be represented and described. • Standards facilitates: – Portability – Sharing – Integration – Interoperability – Reusability
  • 9. Standards • Standards to consider in bioinformatics: • Formats • Schemas • Minimum information guidelines • Controlled vocabularies • Identifiers • Query interfaces
  • 11. 11 PSI-MI Data format Data distribution Control vocabulary Data submission Standard format Tools PSICQUIC PSI-MI CV Reporting guideline MIMIx Tools PSI-MI XML PSI-MITAB XML Java API MITAB Java API XMLMakerFlattener Semantic Validator RPsiXML (Bioconductor) PSI-MI XML files PSI Excel Sheet PSI Web Form Servers Registry Clients PSISCORE Servers Registry Clients
  • 12. • Proteomics Standards Initiative • Work group of the Human Proteome Organization • Defines community standards for data in proteomics – … facilitating data comparison, exchange and verification PSI 12 http://www.psidev.info/
  • 13. • Proteomics Standards Initiative • Work group of the Human Proteome Organization • Defines community standards for data in proteomics – … facilitating data comparison, exchange and verification PSI 13 • MIAPE: The Minimum Information About a Proteomics Experiment • Data and metadata from proteomics experiments • Data: results • Metadata: data about the data • Where the samples came from • How the analysis were performed http://www.psidev.info/
  • 14. • Work group of the Proteomics Standards Initiative • Community coordination to ensure deposition of data in public repositories • Concentrating on … – Annotation and representation of published MI data – Accessibility of MI data to the user community PSI-MI (Molecular Interactions) Data format Data distribution Control vocabulary MIAPE Reporting guideline PSI-MI XML PSI-MITAB PSICQUIC MIMIxPSI-MI CV http://www.psidev.info/MI Scoring PSISCORE
  • 15. PSI-MI format • Community standard for Molecular Interactions • Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others • Collecting and combining data from different sources has become easier • Standardized annotation through PSI-MI ontologies • Tools from different organizations can be chained, e.g. IntAct data in Cytoscape. 15 psi-mi/xml25 psi-mi/tab25
  • 16. PSI-MITAB • Aimed at users that are more comfortable with Excel • Only provides binary interactions psi-mi/tab25
  • 17. Standard columns (15): • ID(s) interactor A & B • Alt. ID(s) interactor A & B • Alias(es) interactor A & B • Interaction detection method(s) • Publication 1st author(s) • Publication Identifier(s) • Taxid interactor A & B • Interaction type(s) • Source database(s) • Interaction identifier(s) • Confidence value(s) Standard columns (21): • Complex expansion • Biological role A & B • Experimental role A & B • Interactor type A & B • Xrefs A, B & Int. • Annotations A, B & Int. • Host organism • Parameters Int. • Created • Updated • CheckSum A, B & Int. • Negative Standard columns (4): • Binding feature A & B • Stoichiometry A & B v2.5 v2.6 v2.7 15 36 40 PSI-MITAB
  • 18. PSI-MI format: Tools • XML Java API (PSI-MI XML 2.5 Java Parser) – Parse “PSI-MI XML” – Create “PSI-MI XML” • MITAB Java API (PSI-MITAB 2.5 Java Parser) – Parse “PSI-MITAB” – Create “PSI-MITAB” • XMLMakerFlattener – “PSI MI XML” to “Tab-delimited format” – “Tab-delimited format” to “PSI MI XML” • XML Validator – Semantic and syntactic consistency • XML transformation: – MIF25_view.xsl “XML” to “HTML” – MIF25_compact.xsl PSI-MI XML “expanded” to “compact” – MIF25_expand.xsl PSI-MI XML “compact” to “expanded” 18
  • 19. 19 • Why do we use them ? e.g. more than 20 ways to write: yeast two hybrid, Y2H, 2H, two-hybrid, … • Intact use PSI-MI ontology • Over 1,500 terms, fully defined and cross-referenced Control vocabulary: PSI-MI ontology
  • 20. Control vocabulary: PSI-MI ontology • Ontology browser: http://www.ebi.ac.uk/ontology-lookup
  • 21. MIMIx • MIAPE document guideline for molecular interactions • 1. Manuscript information • 2. Experiment • 3. Interaction • 4. Confidence
  • 23. Data submission tools PSI-MI XML files PSI Excel Sheet PSI Web Form