SlideShare a Scribd company logo
InterMine 
Integrated Data Warehouse 
Use Cases: Arabidopsis & Medicago Genome Projects 
Vivek Krishnakumar 
Plant Genomics Group (EUK) 
IFX Research WIPS Meeting, 03 October 2014
Overview 
• Introduction 
• InterMine 
 Integrated data warehouse, Extensible data model, 
Flexible query system 
 Web and Programmatic Interface 
 Other InterMine instances 
• Use cases 
 Arabidopsis Information Portal (AIP) 
 Medicago truncatula Genome Database (MTGD) 
• Summary 
 Advantages 
 Caveats
Introduction 
For genome projects that wish to expose their 
data via the web (query, visualize, warehouse) 
to foster scientific collaboration, there are 
several technologies available: 
• JCVI developed software 
 Manatee (backed by an RDBMS) 
• Externally developed software 
 BioMart (federated from various databases) 
 Tripal (powered by Drupal, backed by CHADOdb) 
 InterMine
InterMine 
• Functions as a data warehouse for the integration of complex 
biological data. Integration across data types occurs based on 
a common identifier (e.g. gene primary ID) 
• Uses a flexible and extensible data model, controlled by XML 
files, driven by ontologies (Sequence [SO], Gene [SO], etc.) 
 Genomics, Proteomics, Interactions, Homology, 
Expression, Pathways (and more data types) 
 Parsers for commonly used biological data formats 
 Provides framework for adding your own data 
• Offers a flexible query system, optimized via precomputed 
tables (no need for schema denormalization) 
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics (2012) 28 (23): 3163-3165
InterMine (contd.) 
• Provides a user-friendly web interface exposing 
powerful features: 
 Analysis of lists (facilitate enrichment studies) 
 Full-featured report pages (one-stop shop) 
 Interactive result tables (sort, filter, summarize) 
 Visual query builder (no need to write SQL!) 
 Quick search and Region-based search 
• Fosters development of external applications 
using data hosted within InterMine via Application 
Programming Interfaces (API): 
 RESTful 
 Perl, Python, Ruby, Java, JavaScript 
Kalderimis, A. et al. InterMine: extensive web services for modern biology 
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
Public “Mines” 
• InterMine supports querying across mines 
for cross-database integration 
• Vast number of warehouses powered by 
InterMine already exist
Arabidopsis Information Portal (AIP) 
• AIP origins 
 Funded by NSF in response to community needs, following 
termination of funding to TAIR 
• AIP objectives 
 Develop a community web resource that… 
– is sustainable and fundable and community-extensible 
– hosts analysis & visualization tools, user data spaces 
 Federation: integrate diverse data sets from distributed data 
sources; foster development of tools for and by the community 
 Maintenance of the Col-0 gold standard annotation 
• AIP methods 
 Assimilate TAIR data 
 Host an InterMine instance devoted to Arabidopsis (thale cress) 
 Offer and consume RESTful web services 
 Integrate and utilize iPlant resources
ThaleMine 
https://apps.araport.org/thalemine 
• An InterMine interface 
to Arabidopsis genomic 
data 
• Integrates a wide 
variety of data types 
(A-E, H), some of 
which are warehoused 
and others are 
federated via web 
services 
• Embedded elements 
visualizing gene 
structure (JBrowse, not 
shown), interaction 
networks (F), 
expression patterns (G)
Visual Query Builder 
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
Interactive Result Tables Region-based search 
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
MedicMine 
http://medicmine.jcvi.org 
• NSF funded project to 
assist with the curation 
of the Medicago 
truncatula Genome 
Assembly and 
Annotation (funding 
ended August 2014) 
• In order to warehouse 
and prolong the project 
data, an InterMine 
interface for Medicago 
was implemented 
(backed by a CHADO 
database) 
• Provides similar kind of 
functionality available via 
ThaleMine
Summary 
• Advantages 
 InterMine is a powerful biological data warehouse 
 Performs complex data integration 
 Allows fast and flexible querying 
 Well documented programmatic interface 
 Cookie-cutter, user-friendly web interface 
 Facilitates cross-talk between “mines” 
• Caveats 
 Adding more data requires a full database rebuild (incremental loading 
is not possible) because of the integration step 
• About InterMine: 
 Developed by the Micklem Lab at the University of Cambridge, UK 
 Written in Java, backed by PostgreSQLdb, deployed under Tomcat. 
Documentation and downloads available at http://www.intermine.org
Chris Town, PI 
Chris Nelson 
PM 
Lisa McDonald 
Education and 
Outreach 
Coordinator 
Jason Miller, Co-PI 
Technical Lead 
Erik Ferlanti 
SE 
Vivek Krishnakumar 
BE 
Svetlana Karamycheva 
BE 
Maria Kim 
BE 
Gos Micklem, co-PI Sergio Contrino 
Eva Huala 
Project lead, TAIR 
Software Engineer 
Bob Muller 
Technical lead, TAIR 
Matt Vaughn 
co-PI Steve Mock 
Advanced Computing 
Interfaces 
Rion Dooley, 
Web and Cloud 
Services 
Matt Hanlon, 
Web and Mobile 
Applications 
Ben Rosen 
BA

More Related Content

What's hot

Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
Artificial Intelligence Institute at UofSC
 
COBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF SecretariatCOBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF Secretariat
EDINA, University of Edinburgh
 
Bioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of MinhoBioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of Minho
introfini
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
Globus
 
Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25
emorency
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
ManjulaPatel
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
Syed Ahmad Chan Bukhari, PhD
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
Michael Genkin
 
The VIVO Ontology Project
The VIVO Ontology ProjectThe VIVO Ontology Project
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
ManjulaPatel
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
mehmood78
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challenge
hopbeat
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
petrknoth
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Update
imgcommcall
 
The agINFRA Germplasm Working Group
The agINFRA Germplasm Working GroupThe agINFRA Germplasm Working Group
The agINFRA Germplasm Working Group
Vassilis Protonotarios
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challenges
Bhojaraju Gunjal
 
Metid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for ScienceMetid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for Science
ale93756
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
sesrdm
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
Archiver
 

What's hot (20)

Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
COBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF SecretariatCOBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF Secretariat
 
Bioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of MinhoBioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of Minho
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
 
The VIVO Ontology Project
The VIVO Ontology ProjectThe VIVO Ontology Project
The VIVO Ontology Project
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challenge
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Update
 
The agINFRA Germplasm Working Group
The agINFRA Germplasm Working GroupThe agINFRA Germplasm Working Group
The agINFRA Germplasm Working Group
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challenges
 
Metid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for ScienceMetid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for Science
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 

Viewers also liked

Ux in dm d4=r1
Ux in dm d4=r1Ux in dm d4=r1
Ux in dm d4=r1
Sebastian Daum
 
დედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიდედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიirmasurmanidze5
 
An overview of BizTalk
An overview of BizTalkAn overview of BizTalk
An overview of BizTalk
Prasanth Gnanasekaran
 
Cami lo anongcar
Cami lo anongcarCami lo anongcar
Cami lo anongcar
harniel
 
Persuasive design presentationd3=r1
Persuasive design presentationd3=r1Persuasive design presentationd3=r1
Persuasive design presentationd3=r1
Sebastian Daum
 
The piece of paper
The piece of paperThe piece of paper
The piece of paper
harniel
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
Vivek Krishnakumar
 

Viewers also liked (8)

Ux in dm d4=r1
Ux in dm d4=r1Ux in dm d4=r1
Ux in dm d4=r1
 
დედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიდედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტები
 
An overview of BizTalk
An overview of BizTalkAn overview of BizTalk
An overview of BizTalk
 
Cami lo anongcar
Cami lo anongcarCami lo anongcar
Cami lo anongcar
 
Dracaena
DracaenaDracaena
Dracaena
 
Persuasive design presentationd3=r1
Persuasive design presentationd3=r1Persuasive design presentationd3=r1
Persuasive design presentationd3=r1
 
The piece of paper
The piece of paperThe piece of paper
The piece of paper
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 

Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
Victor Cassen
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Bradford Condon
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
Liaquat Rahoo
 
DLF 2008 Spring Forum - HarvestChoice
DLF 2008 Spring Forum  - HarvestChoiceDLF 2008 Spring Forum  - HarvestChoice
DLF 2008 Spring Forum - HarvestChoice
libsys
 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
ChemAxon
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012
Stephen Katz
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Dag Endresen
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Blue BRIDGE
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
semanticsconference
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
Vivien Bonazzi
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
DMR (Directorate of Mushroom Research), ICAR, GOI
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
aceas13tern
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures
Francisco Pando
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
Enis Afgan
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
Pierre Larmande
 

Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting (20)

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
 
DLF 2008 Spring Forum - HarvestChoice
DLF 2008 Spring Forum  - HarvestChoiceDLF 2008 Spring Forum  - HarvestChoice
DLF 2008 Spring Forum - HarvestChoice
 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
 

More from Vivek Krishnakumar

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
Vivek Krishnakumar
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
Vivek Krishnakumar
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
Vivek Krishnakumar
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Vivek Krishnakumar
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
Vivek Krishnakumar
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
Vivek Krishnakumar
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
Vivek Krishnakumar
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 

More from Vivek Krishnakumar (9)

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 

Recently uploaded

Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 

Recently uploaded (20)

Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

  • 1. InterMine Integrated Data Warehouse Use Cases: Arabidopsis & Medicago Genome Projects Vivek Krishnakumar Plant Genomics Group (EUK) IFX Research WIPS Meeting, 03 October 2014
  • 2. Overview • Introduction • InterMine  Integrated data warehouse, Extensible data model, Flexible query system  Web and Programmatic Interface  Other InterMine instances • Use cases  Arabidopsis Information Portal (AIP)  Medicago truncatula Genome Database (MTGD) • Summary  Advantages  Caveats
  • 3. Introduction For genome projects that wish to expose their data via the web (query, visualize, warehouse) to foster scientific collaboration, there are several technologies available: • JCVI developed software  Manatee (backed by an RDBMS) • Externally developed software  BioMart (federated from various databases)  Tripal (powered by Drupal, backed by CHADOdb)  InterMine
  • 4. InterMine • Functions as a data warehouse for the integration of complex biological data. Integration across data types occurs based on a common identifier (e.g. gene primary ID) • Uses a flexible and extensible data model, controlled by XML files, driven by ontologies (Sequence [SO], Gene [SO], etc.)  Genomics, Proteomics, Interactions, Homology, Expression, Pathways (and more data types)  Parsers for commonly used biological data formats  Provides framework for adding your own data • Offers a flexible query system, optimized via precomputed tables (no need for schema denormalization) Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data Bioinformatics (2012) 28 (23): 3163-3165
  • 5. InterMine (contd.) • Provides a user-friendly web interface exposing powerful features:  Analysis of lists (facilitate enrichment studies)  Full-featured report pages (one-stop shop)  Interactive result tables (sort, filter, summarize)  Visual query builder (no need to write SQL!)  Quick search and Region-based search • Fosters development of external applications using data hosted within InterMine via Application Programming Interfaces (API):  RESTful  Perl, Python, Ruby, Java, JavaScript Kalderimis, A. et al. InterMine: extensive web services for modern biology Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
  • 6. Public “Mines” • InterMine supports querying across mines for cross-database integration • Vast number of warehouses powered by InterMine already exist
  • 7. Arabidopsis Information Portal (AIP) • AIP origins  Funded by NSF in response to community needs, following termination of funding to TAIR • AIP objectives  Develop a community web resource that… – is sustainable and fundable and community-extensible – hosts analysis & visualization tools, user data spaces  Federation: integrate diverse data sets from distributed data sources; foster development of tools for and by the community  Maintenance of the Col-0 gold standard annotation • AIP methods  Assimilate TAIR data  Host an InterMine instance devoted to Arabidopsis (thale cress)  Offer and consume RESTful web services  Integrate and utilize iPlant resources
  • 8. ThaleMine https://apps.araport.org/thalemine • An InterMine interface to Arabidopsis genomic data • Integrates a wide variety of data types (A-E, H), some of which are warehoused and others are federated via web services • Embedded elements visualizing gene structure (JBrowse, not shown), interaction networks (F), expression patterns (G)
  • 9. Visual Query Builder Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 10. Interactive Result Tables Region-based search Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 11. MedicMine http://medicmine.jcvi.org • NSF funded project to assist with the curation of the Medicago truncatula Genome Assembly and Annotation (funding ended August 2014) • In order to warehouse and prolong the project data, an InterMine interface for Medicago was implemented (backed by a CHADO database) • Provides similar kind of functionality available via ThaleMine
  • 12. Summary • Advantages  InterMine is a powerful biological data warehouse  Performs complex data integration  Allows fast and flexible querying  Well documented programmatic interface  Cookie-cutter, user-friendly web interface  Facilitates cross-talk between “mines” • Caveats  Adding more data requires a full database rebuild (incremental loading is not possible) because of the integration step • About InterMine:  Developed by the Micklem Lab at the University of Cambridge, UK  Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org
  • 13. Chris Town, PI Chris Nelson PM Lisa McDonald Education and Outreach Coordinator Jason Miller, Co-PI Technical Lead Erik Ferlanti SE Vivek Krishnakumar BE Svetlana Karamycheva BE Maria Kim BE Gos Micklem, co-PI Sergio Contrino Eva Huala Project lead, TAIR Software Engineer Bob Muller Technical lead, TAIR Matt Vaughn co-PI Steve Mock Advanced Computing Interfaces Rion Dooley, Web and Cloud Services Matt Hanlon, Web and Mobile Applications Ben Rosen BA