Interoperation between
InterMines
Legume Federation, June 22, 2015
Vivek Krishnakumar
Chris Town
J. Craig Venter Institute
InterMine in a nutshell
• Open-source data warehouse software
• Integration of complex biological data
• Parsers for common biological data formats
• Extensible framework for custom data
• Cookie-cutter interface, highly customizable
• Interact using sophisticated web query tools
• Programmatic access using web-service API
Open-source Project
• Source code available online
• Distributed with the GNU
LGPL license
• GitHub Repo:
https://github.com/intermine/int
ermine
• GitHub Organization:
https://github.com/intermine
intermine / intermine
> bio
> biotestmine
> config
> flymine
> humanmine
> imbuild
> intermine
> testmodel
.gitignore
.travis.yml
LICENSE
LICENSE.LIBS
README.md
RELEASE_NOTES
Richard N. Smith et al. Bioinformatics 2012;28:3163-3165
InterMine system architecture
InterMine system architecture
Web Application
• Java Server Pages (JSP), HTML, JS, CSS
• Interfaces with Java Servlets and IM web-services
Web Server
• Tomcat 7.0.x, serves Web application ARchive file
• ant based build system using Java SDK
Database Server
• PostgreSQL 9.2 or above
• range query, btree, gist enabled (refer docs here)
http://intermine.readthedocs.org/en/latest/system-requirements/
Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472
InterMine web services
http://iodocs.labs.intermine.org
JBrowse
Federated Authentication
• Apart from the standard login scheme
(username/password), InterMine supports industry
standard OAuth2 based login flows, implemented
by Google, GitHub, Agave, etc.
• ThaleMine (Arabidopsis) relies on this
infrastructure to authenticate users against the
araport.org tenant registered within the Agave
infrastructure
• Documentation available here:
http://intermine.readthedocs.org/en/latest/webapp/
properties/web-properties/#openauth2-settings-
aka-openid-connect
Interoperability?
• Ability of InterMine instances to
communicate ‘automatically’ with each
other
• By way of leveraging web services
• Questions to be answered:
 What do they say to each other?
 How do they say it?
 What mechanisms are used?
 Enabling these mechanisms…
Data Model
• Data Model === Schema of InterMine
instance
• Defined in XML format
• Core data model (based on SO) can be
extended to suit requirements
• Access a mines data model in JSON format
http://MINE_URL/service/model/?format=json
• Compatibility of data models across mines
ensures interoperability
Advantages of common data
model
• Data mining scripts developed for one
mine immediately compatible with
others
• Promotes crowdsourcing
 one/more groups write
tools/widgets/parsers
 can be easily reused by others
• Enables cross species analysis
Available tools
• Multi-mine search tool
https://github.com/alexkalderimis/multimine-search-tool
 Based on InterMine Lucene-based search index
 Allows for interoperation when data models are different
• Integration based on Homologs:
 Ontology integration using `dagify`
https://github.com/intermine/dagify
 Pathway Integration by way of collating shared pathways
• InterMine Staircase
 Powerful client-side interface enabling data analysis
workflows and cross-mine integration via web services
http://staircase.herokuapp.com
InterMine Staircase
InterMine Staircase
Configure access to multiple mines
InterMine Staircase
Cross-mine search
InterMine Staircase
Filter results by facets
InterMine Staircase
Prepare and enrich lists
InterMine Staircase
Perform mine-to-mine list conversions
InterMine Staircase
App/tool compatibility
InterMine Staircase
Application model
MedicMine SoyMine....
Available Reference Mines
• ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/
 Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0
 Leverages both data warehousing and federation methods
 Represents wide variety of data: genes, proteins, function, expression, co-expression,
interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm,
phenotypes
• MedicMine: https://github.com/jcvi-plant-genomics/intermine/
 Warehouse for Medicago truncatula A17 genomic data
 Houses variety of data: genes, proteins, function, expression
• PhytoMine: https://github.com/JoeCarlson/intermine/
 Warehouse for 47 different Angiosperm genomes
 Developed on a Chado  InterMine migration path
 Houses variety of data: genes, proteins, expression, homologs, protein families,
variation
• FlyMine: https://github.com/intermine/intermine/
Recommendations and Challenges
• Recommendations:
 Develop core plant InterMine model
 Follow InterMine guidelines
 Learn from prior initiatives - InterMOD
• Challenges
 Users/developers are used to current way of
doing things
 Time taken to adapt to common data model
and/or software stack
 Difficult to arrive at consensus with diverse group
Acknowledgments
• InterMine Team
 Gos Micklem
 Julie Sullivan
 Alex Kalderimis
 Richard Smith
 Sergio Contrino
 Josh Heimbach
 et al.
• Araport Team
 Chris Town
 Jason Miller
 Matt Vaughn
 Maria Kim
 Svetlana
Karamycheva
 Erik Ferlanti
 Chia-Yi Cheng
 Benjamin Rosen
 Irina Belyaeva

Interoperation between InterMines

  • 1.
    Interoperation between InterMines Legume Federation,June 22, 2015 Vivek Krishnakumar Chris Town J. Craig Venter Institute
  • 2.
    InterMine in anutshell • Open-source data warehouse software • Integration of complex biological data • Parsers for common biological data formats • Extensible framework for custom data • Cookie-cutter interface, highly customizable • Interact using sophisticated web query tools • Programmatic access using web-service API
  • 3.
    Open-source Project • Sourcecode available online • Distributed with the GNU LGPL license • GitHub Repo: https://github.com/intermine/int ermine • GitHub Organization: https://github.com/intermine intermine / intermine > bio > biotestmine > config > flymine > humanmine > imbuild > intermine > testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES
  • 4.
    Richard N. Smithet al. Bioinformatics 2012;28:3163-3165 InterMine system architecture
  • 5.
    InterMine system architecture WebApplication • Java Server Pages (JSP), HTML, JS, CSS • Interfaces with Java Servlets and IM web-services Web Server • Tomcat 7.0.x, serves Web application ARchive file • ant based build system using Java SDK Database Server • PostgreSQL 9.2 or above • range query, btree, gist enabled (refer docs here) http://intermine.readthedocs.org/en/latest/system-requirements/
  • 6.
    Alex Kalderimis etal. Nucl. Acids Res. 2014;42:W468-W472 InterMine web services http://iodocs.labs.intermine.org JBrowse
  • 7.
    Federated Authentication • Apartfrom the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc. • ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure • Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/ properties/web-properties/#openauth2-settings- aka-openid-connect
  • 8.
    Interoperability? • Ability ofInterMine instances to communicate ‘automatically’ with each other • By way of leveraging web services • Questions to be answered:  What do they say to each other?  How do they say it?  What mechanisms are used?  Enabling these mechanisms…
  • 9.
    Data Model • DataModel === Schema of InterMine instance • Defined in XML format • Core data model (based on SO) can be extended to suit requirements • Access a mines data model in JSON format http://MINE_URL/service/model/?format=json • Compatibility of data models across mines ensures interoperability
  • 10.
    Advantages of commondata model • Data mining scripts developed for one mine immediately compatible with others • Promotes crowdsourcing  one/more groups write tools/widgets/parsers  can be easily reused by others • Enables cross species analysis
  • 11.
    Available tools • Multi-minesearch tool https://github.com/alexkalderimis/multimine-search-tool  Based on InterMine Lucene-based search index  Allows for interoperation when data models are different • Integration based on Homologs:  Ontology integration using `dagify` https://github.com/intermine/dagify  Pathway Integration by way of collating shared pathways • InterMine Staircase  Powerful client-side interface enabling data analysis workflows and cross-mine integration via web services http://staircase.herokuapp.com
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    Available Reference Mines •ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/  Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0  Leverages both data warehousing and federation methods  Represents wide variety of data: genes, proteins, function, expression, co-expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes • MedicMine: https://github.com/jcvi-plant-genomics/intermine/  Warehouse for Medicago truncatula A17 genomic data  Houses variety of data: genes, proteins, function, expression • PhytoMine: https://github.com/JoeCarlson/intermine/  Warehouse for 47 different Angiosperm genomes  Developed on a Chado  InterMine migration path  Houses variety of data: genes, proteins, expression, homologs, protein families, variation • FlyMine: https://github.com/intermine/intermine/
  • 21.
    Recommendations and Challenges •Recommendations:  Develop core plant InterMine model  Follow InterMine guidelines  Learn from prior initiatives - InterMOD • Challenges  Users/developers are used to current way of doing things  Time taken to adapt to common data model and/or software stack  Difficult to arrive at consensus with diverse group
  • 22.
    Acknowledgments • InterMine Team Gos Micklem  Julie Sullivan  Alex Kalderimis  Richard Smith  Sergio Contrino  Josh Heimbach  et al. • Araport Team  Chris Town  Jason Miller  Matt Vaughn  Maria Kim  Svetlana Karamycheva  Erik Ferlanti  Chia-Yi Cheng  Benjamin Rosen  Irina Belyaeva

Editor's Notes

  • #4 bio: code to deal with biological data, including data sources flymine: config used to create FlyMine testmodel: non-biological test data model used for testing core InterMine imbuild: ant-based build system, do not edit anything intermine: the core (generic) InterMine code to work with any data model
  • #5 ObjectStore: custom Java object/relational mapping system, optimized for read-only database performance Query optimizer: pre-computed tables joining connected data from different tables, improves PostgreSQL performance
  • #7 Summary of web services available through InterMine.