Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

InterMine
Integrated Data Warehouse
Use Cases: Arabidopsis & Medicago Genome Projects
Vivek Krishnakumar
Plant Genomics Group (EUK)
IFX Research WIPS Meeting, 03 October 2014

Overview
• Introduction
• InterMine
 Integrated data warehouse, Extensible data model,
Flexible query system
 Web and Programmatic Interface
 Other InterMine instances
• Use cases
 Arabidopsis Information Portal (AIP)
 Medicago truncatula Genome Database (MTGD)
• Summary
 Advantages
 Caveats

Introduction
For genome projects that wish to expose their
data via the web (query, visualize, warehouse)
to foster scientific collaboration, there are
several technologies available:
• JCVI developed software
 Manatee (backed by an RDBMS)
• Externally developed software
 BioMart (federated from various databases)
 Tripal (powered by Drupal, backed by CHADOdb)
 InterMine

InterMine
• Functions as a data warehouse for the integration of complex
biological data. Integration across data types occurs based on
a common identifier (e.g. gene primary ID)
• Uses a flexible and extensible data model, controlled by XML
files, driven by ontologies (Sequence [SO], Gene [SO], etc.)
 Genomics, Proteomics, Interactions, Homology,
Expression, Pathways (and more data types)
 Parsers for commonly used biological data formats
 Provides framework for adding your own data
• Offers a flexible query system, optimized via precomputed
tables (no need for schema denormalization)
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data
Bioinformatics (2012) 28 (23): 3163-3165

InterMine (contd.)
• Provides a user-friendly web interface exposing
powerful features:
 Analysis of lists (facilitate enrichment studies)
 Full-featured report pages (one-stop shop)
 Interactive result tables (sort, filter, summarize)
 Visual query builder (no need to write SQL!)
 Quick search and Region-based search
• Fosters development of external applications
using data hosted within InterMine via Application
Programming Interfaces (API):
 RESTful
 Perl, Python, Ruby, Java, JavaScript
Kalderimis, A. et al. InterMine: extensive web services for modern biology
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472

Public “Mines”
• InterMine supports querying across mines
for cross-database integration
• Vast number of warehouses powered by
InterMine already exist

Arabidopsis Information Portal (AIP)
• AIP origins
 Funded by NSF in response to community needs, following
termination of funding to TAIR
• AIP objectives
 Develop a community web resource that…
– is sustainable and fundable and community-extensible
– hosts analysis & visualization tools, user data spaces
 Federation: integrate diverse data sets from distributed data
sources; foster development of tools for and by the community
 Maintenance of the Col-0 gold standard annotation
• AIP methods
 Assimilate TAIR data
 Host an InterMine instance devoted to Arabidopsis (thale cress)
 Offer and consume RESTful web services
 Integrate and utilize iPlant resources

ThaleMine
https://apps.araport.org/thalemine
• An InterMine interface
to Arabidopsis genomic
data
• Integrates a wide
variety of data types
(A-E, H), some of
which are warehoused
and others are
federated via web
services
• Embedded elements
visualizing gene
structure (JBrowse, not
shown), interaction
networks (F),
expression patterns (G)

Visual Query Builder
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)

Interactive Result Tables Region-based search
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)

MedicMine
http://medicmine.jcvi.org
• NSF funded project to
assist with the curation
of the Medicago
truncatula Genome
Assembly and
Annotation (funding
ended August 2014)
• In order to warehouse
and prolong the project
data, an InterMine
interface for Medicago
was implemented
(backed by a CHADO
database)
• Provides similar kind of
functionality available via
ThaleMine

Summary
• Advantages
 InterMine is a powerful biological data warehouse
 Performs complex data integration
 Allows fast and flexible querying
 Well documented programmatic interface
 Cookie-cutter, user-friendly web interface
 Facilitates cross-talk between “mines”
• Caveats
 Adding more data requires a full database rebuild (incremental loading
is not possible) because of the integration step
• About InterMine:
 Developed by the Micklem Lab at the University of Cambridge, UK
 Written in Java, backed by PostgreSQLdb, deployed under Tomcat.
Documentation and downloads available at http://www.intermine.org

Chris Town, PI
Chris Nelson
PM
Lisa McDonald
Education and
Outreach
Coordinator
Jason Miller, Co-PI
Technical Lead
Erik Ferlanti
SE
Vivek Krishnakumar
BE
Svetlana Karamycheva
BE
Maria Kim
BE
Gos Micklem, co-PI Sergio Contrino
Eva Huala
Project lead, TAIR
Software Engineer
Bob Muller
Technical lead, TAIR
Matt Vaughn
co-PI Steve Mock
Advanced Computing
Interfaces
Rion Dooley,
Web and Cloud
Services
Matt Hanlon,
Web and Mobile
Applications
Ben Rosen
BA

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting (20)

More from Vivek Krishnakumar

More from Vivek Krishnakumar (9)

Recently uploaded

Recently uploaded (20)

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting