IntAct editor - an effective Interaction
curation tool.

Jyoti Khadake
Proteomic Services Team
Overview

    • What is IntAct molecular interaction
      database
    • What is a Molecular Interaction
    • Data Model
    • Standardized Schema and CV
    • Editor and management of entries




       25 April 2012
2
What is IntAct?

    • IntAct is a database of molecular interaction
    • Capture interactions from literature
    • It is Manually Curated
    • Is extensively cross-referenced
    • Provides tools to query, analyze and visualize
      data
    • Open source and open access



         25/04/12
3
What is a molecular interaction?

    Physical interactions
    • Types
      • Self
      • Binary: homomeric or heteromeric
      • N-nary complexes
      • Co-localisations
      • Enzymatic assays
      • Purified protein interactions
      25/04/12
4
Interaction details captured from literature

       • Pubmed ID – Source of data
       • The interactors
       • Experimental conditions of
         interactions
       • Type of interaction
       • Properties of interactors



          25 April 2012
5
Details of an interaction
Is it a duck or a rabbit?




    25/04/12
7
Data Model
                                               Participant3

                            Interaction1
                                                                             Protein1
                                               Participant1
              Experiment1       Interaction2
                                               Participant2

                             Interaction3                                    Protein2



Publication
              Experiment2    Interaction4

                                                                            . Roles
                                                                            . Features




                                                              Participant
                                                                            . Preparations



8
How do databases store Interactions




         25/04/12
9
www.ebi.ac.uk/ols for controlled vocabularies
How do we curate and manage data


     • We use – The New Editor.




          25/04/12
11
Publication & Experiment




     25/04/12
12
Publication View




     25/04/12
13
Experiment view




     25/04/12
14
Interaction




     25/04/12
15
Interaction View




     25/04/12
16
Participant




17
Feature




     25/04/12
18
Participant View




     25/04/12
19
Interactor




     25/04/12
20
Interactor View




     25/04/12
21
IntAct Curation management
                 “Lifecycle of an Interaction”

                                         Ready for rechecking
                                                                    validator
                                                                accept
                                           reject


                                                          exp
Publication                               In progress                    accept
 (full text)
                                 .                         p2                     Publish
                                                                 I
                                                                p1




                                                                                      check
CVs                           Curation
                               manual                           reject




               curator                    Super curator                           Author
                      Master headline
25/04/12
23
Databases using Editor
Summary

     • IntAct editor is based on the
       PSI-MI schema v2.5
     • This enables us to capture
       details of Interaction data
     • Allows us to manage curation
       process
     • It is used by various other
       databases.

        25/04/12
25
Proteomic Services Team
                Thank you!
                Questions?

     25/04/12
26

                Pablo   Dave
25/04/12
27
The five EMBL sites
The Wellcome Trust Genome Campus


Data centre                                     Sanger Institute
                                                Sulston building

Sanger
labs /
informatics

 Cairns
 Pavilion
 (shared)                                       EMBL-EBI


                              © John Freebury




29
New types of data                                Literature


Genomes                                            Protein sequence


Nucleotide sequence                                      Proteomes


       Gene expression                                          Protein structure


              Protein families,
             domains and motifs
                                                                    Chemical entities


                          Molecular
                         interactions


                                        Pathways
                                                                       Systems

                                              30
Format for storage and exchange –
     PSI-MI XML 2.5




31
Main objects - Experiment

                                 Literature
                                 references




                                    Controlled by
                                    Ontologies




                                  Confidence
                                  measures

32
Main objects - Participant


                                     Interactor

                                        e.g. enzyme target
                              Building of Complex

                                  e.g. bait, prey

                                        Delivery method
                                        expression level…

                             Interactor used
                              experimentally



33

Bc2012 jyotiebi poster44ec82

  • 1.
    IntAct editor -an effective Interaction curation tool. Jyoti Khadake Proteomic Services Team
  • 2.
    Overview • What is IntAct molecular interaction database • What is a Molecular Interaction • Data Model • Standardized Schema and CV • Editor and management of entries 25 April 2012 2
  • 3.
    What is IntAct? • IntAct is a database of molecular interaction • Capture interactions from literature • It is Manually Curated • Is extensively cross-referenced • Provides tools to query, analyze and visualize data • Open source and open access 25/04/12 3
  • 4.
    What is amolecular interaction? Physical interactions • Types • Self • Binary: homomeric or heteromeric • N-nary complexes • Co-localisations • Enzymatic assays • Purified protein interactions 25/04/12 4
  • 5.
    Interaction details capturedfrom literature • Pubmed ID – Source of data • The interactors • Experimental conditions of interactions • Type of interaction • Properties of interactors 25 April 2012 5
  • 6.
    Details of aninteraction
  • 7.
    Is it aduck or a rabbit? 25/04/12 7
  • 8.
    Data Model Participant3 Interaction1 Protein1 Participant1 Experiment1 Interaction2 Participant2 Interaction3 Protein2 Publication Experiment2 Interaction4 . Roles . Features Participant . Preparations 8
  • 9.
    How do databasesstore Interactions 25/04/12 9
  • 10.
  • 11.
    How do wecurate and manage data • We use – The New Editor. 25/04/12 11
  • 12.
  • 13.
    Publication View 25/04/12 13
  • 14.
    Experiment view 25/04/12 14
  • 15.
    Interaction 25/04/12 15
  • 16.
    Interaction View 25/04/12 16
  • 17.
  • 18.
    Feature 25/04/12 18
  • 19.
    Participant View 25/04/12 19
  • 20.
    Interactor 25/04/12 20
  • 21.
    Interactor View 25/04/12 21
  • 22.
    IntAct Curation management “Lifecycle of an Interaction” Ready for rechecking validator accept reject exp Publication In progress accept (full text) . p2 Publish I p1 check CVs Curation manual reject curator Super curator Author Master headline
  • 23.
  • 24.
  • 25.
    Summary • IntAct editor is based on the PSI-MI schema v2.5 • This enables us to capture details of Interaction data • Allows us to manage curation process • It is used by various other databases. 25/04/12 25
  • 26.
    Proteomic Services Team Thank you! Questions? 25/04/12 26 Pablo Dave
  • 27.
  • 28.
  • 29.
    The Wellcome TrustGenome Campus Data centre Sanger Institute Sulston building Sanger labs / informatics Cairns Pavilion (shared) EMBL-EBI © John Freebury 29
  • 30.
    New types ofdata Literature Genomes Protein sequence Nucleotide sequence Proteomes Gene expression Protein structure Protein families, domains and motifs Chemical entities Molecular interactions Pathways Systems 30
  • 31.
    Format for storageand exchange – PSI-MI XML 2.5 31
  • 32.
    Main objects -Experiment Literature references Controlled by Ontologies Confidence measures 32
  • 33.
    Main objects -Participant Interactor e.g. enzyme target Building of Complex e.g. bait, prey Delivery method expression level… Interactor used experimentally 33

Editor's Notes

  • #9 An interaction can have one (auto-phosphorylation) or many participants (binary or n-ary) It can involve protein, but also small molecule, DNA, RNA… A participant is the specific instance of the interactor (eg. protein) in the context of an interaction eg. Interactor: P12345, Participant: P12345 with GST tag and mutated residue
  • #23 Take Bind out Tell them that we also export some data for text-miners !
  • #29 We’re the second largest of the five EMBL sites; there is the main lab and administrative centre in Heidelberg; structural biology labs in Hamburg and Grenoble; mouse biology in Monterotondo, near Rome, and bioinformatics in Hinxton. There are around 1,500 staff within EMBL and about 500 of those work at the EBI.
  • #30 We’re based on the Wellcome Trust Genome Campus in Hinxton, south of Cambridge, UK, which we share with the Wellcome Trust Sanger Institute. This is a good strategic fit as the Sanger is a major sequencing centre (most famous for sequencing 1/3 of the human genome) with a strong programme in functional genomics.
  • #31 The EBI is probably unique in the world for its range of data resources and tools, spanning everything from DNA and protein sequence to complex pathways and networks. At the EBI, we separate resource development and provision, which we call services, and research although these two are closely related. Both the research areas and services follow the different areas of focus as shown on the slide. Some of the types of data that are now being collected in a high-throughput way, presenting new challenges for how we organise and store this data.