SlideShare a Scribd company logo
araport.org@araport
Tripal within the Arabidopsis
Information Portal
Vivek Krishnakumar
J. Craig Venter Institute
12/11/2015
Tripal Database Network and Initiatives
PAG XXIII, San Diego, CA
araport.org@araport
Overview
•  About Araport
•  Current architecture
•  Planned implementation
– Leverage Chado schema
– Accommodate inherited data
– Serve as point of integration
– Facilitate data sharing via web services
araport.org@araport
About Araport
•  Objectives
–  Develop community web interface
•  sustainable, fundable and community-extensible
•  hosts analysis modules, visualization tools, user data
spaces
–  Practice data federation
•  integrate diverse data sets from distributed sources
•  consume and expose data via RESTful web services
–  Maintain “gold standard” Col-0 annotation
•  assemble tissue-specific transcripts from publicly available
RNA-seq datasets
•  incorporate novel coding and non-coding genes
araport.org@araport
Araport
https://www.araport.org
•  Explore data
•  ThaleMine
•  JBrowse
•  Science Apps
•  Search data
•  Quick Search
•  BLAST
•  Raw data downloads
•  Community
•  News & Events
•  Ask a question
•  Job Postings
•  Useful Links
araport.org@araport
Araport Architecture
External programsPortal (www.araport.org)
API (api.araport.org)
Agave Core
meta data
user profile
ADAMA
service manage
service enroll
a b c d e f
CGI
Computing
Storage
Databases
ThaleMine JBrowse
Authentication, metering, logging, versioning, HTTPS, CORS
a b c d e f
Apps
Jobs
Systems
CGI
InterMine
Others
Tripal
SOAP
CGI
REST
Science Apps
araport.org@araport
Current implementation
Araport data mart
Combination of flat-files and databases
•  TAIR datasets
•  Ontologies (GO, PSI)
•  Interactions (BAR)
•  Orthologs (Panther)
Data Mart
•  InterMine schema, PostgreSQL DB
•  Indexed and flattened for speed
•  Rebuilt periodically
Outputs
•  ThaleMine WebApp
•  ThaleMine web services
publish
Araport warehouse
Web services
InterMine loader live calls to…
•  UniProt web services
•  PubMed web services
publish
araport.org@araport
Planned implementation
Araport warehouse Araport data mart
Warehouse
•  Chado schema, PostgreSQL DB
•  General purpose but slow
•  Permanent host for core genomic
datasets (assembly, annotation,
metadata, etc.)
Inputs
•  Genome annotation pipeline
•  Community curation data
Outputs
•  ThaleMine WebApp
•  ThaleMine web services
publish
Data Mart
•  InterMine schema, PostgreSQL DB
•  Indexed and flattened for speed
•  Rebuilt periodically
araport.org@araport
•  Functions as our low-level (core) Araport data
warehouse
–  Preserve legacy datasets with appropriate attributions
–  Track any new datasets generated (annotation updates,
community contributions)
–  Serve as point of integration and de-duplication of
certain data types
–  Integrate with planned community curation interface
•  Supports our pursuit of being open-source (and
future-proof)
http://gmod.org/wiki/Chado
araport.org@araport
•  Drupal CMS based modularized framework,
exposing a user-friendly interface to Chado
– provides standardized loaders for genomic
datasets (FASTA, GFF3, GenBank, BLAST,
GO, InterProScan, KEGG)
– supports building custom templates and
materialized views
– exposes well documented API
http://tripal.info
araport.org@araport
Integrate data inherited from TAIR
•  Currently a combination of flat-files and TAIR’s Oracle database
–  Genome Assembly (TAIR9)
–  Genome Annotation (TAIR10): genes, pseudogenes, transposons,
ncRNAs
–  Annotation properties: gene symbols, confidence ranking, functional
descriptions, curator summary
–  GO Annotations (TAIR curated data at geneontology.org)
–  Publications (curated gene à publication relationships)
–  Variation data: Genetic markers, Polymorphisms (SNPs, TILLing) and T-
DNA Insertions
–  Stock data (lines, clones, germplasm)
•  Chado backed Tripal will serve as the core repository for this data
araport.org@araport
Integrate with planned Community
Curation Interface
araport.org@araport
Integrate publication data
•  Existing sources for publication data
–  TAIR locus to PubMed ID mapping
–  NCBI gene2pubmed mapping
–  UniProt curated Protein to PubMed ID mapping
–  Publications missing PMIDs and/or DOIs
•  Chado will act as point of integration
–  Combine and de-duplicate publication data from 3
sources (more in the future)
–  Collect and store metadata for publications with and
without PMID and/or DOIs
araport.org@araport
Integrate
Stock data
•  TAIR stock related
tables mapped to
corresponding
Chado counterpart
•  Custom loaders
developed to
perform bulk
update of Stock
information,
Phenotypes,
Polymorphism data
and mappings to
AGI locus
araport.org@araport
Role of Tripal within Araport
•  Tripal is under active development, with plans in
place to begin developing rational web services
(WS) as well as support interoperability
•  Araport plans to be involved in this working
group to satisfy the following needs of our
project:
–  Expose live data from future annotation update
pipelines to the community directly via WS
–  Expose stock data via WS in a standardized manner
to Arabidopsis stock centers (both ABRC and NASC)
to aid data synchronization
–  Embrace and support other open-source initiatives
araport.org@araport
Araport on GitHub
•  GitHub organization:
https://www.github.com/Arabidopsis-Information-Portal
•  Relevant repositories:
–  tair-chado-batchflow
–  chado_pub_loader
–  pasa-chado-hook
–  GMOD/Apollo (fork)
araport.org@araport
Acknowledgements
•  JCVI Developers
–  Maria Kim
–  Irina Belyaeva
–  Svetlana Karamycheva
•  Tripal co-PI Stephen Ficklin and development
community
•  TAIR/Phoenix Bio: assistance with data
migration
•  Funding Agencies
araport.org@araport
Chris Town, PI
Lisa McDonald
Education and
Outreach Coordinator
Chris Nelson
Project Manager
Jason Miller, Co-PI
JCVI Technical Lead
Erik Ferlanti
Software Engineer
Vivek Krishnakumar
Bioinf. Engineer
Svetlana Karamycheva
Bioinf Engineer
Eva Huala
Project lead, TAIR
Bob Muller
Technical lead, TAIR
Gos Micklem,
co-PI
Sergio Contrino
Software Engineer
Matt Vaughn
co-PI Steve Mock
Advanced Computing
Interfaces
Rion Dooley,
Web and Cloud
Services
Matt Hanlon,
Web and Mobile
Applications
Maria Kim
Bioinf
Engineer
Ben Rosen
Bioinf Analyst
Joe Stubbs,
API Developer
Platform
Walter Moreira
API Developer
Federation
Chris Jordan
Database
Manager
Eleanor Pence
Intern
Chia-Yi Cheng
Bioinf Analyst
Seth Schobel
Bioinf. Engineer
Araport Team
Irina Belyaeva
Software Engineer
araport.org@araport
THANK YOU!
araport.org@araport
Araport @ PAG XXIII
Session Details Topic(s) Presenter(s)
Tripal Database Network
and Initiatives
Sunday, January 11, 2015
5:30 PM-5:45 PM
California
W876: Tripal within the Arabidopsis Information Portal Vivek Krishnakumar
Arabidopsis Information
Portal & IAIC Workshop
Monday, January 12, 2015
12:50 PM-3:00 PM
Pacific Salon 6-7 (2nd Floor)
W059: Walkthrough the Araport Web Site
W061: Exposing Web Services for Araport
W062: Developing applications for Araport
Chia-Yi Cheng
Jason Miller
Matt Vaughn
Computer Demo 2
Tuesday, January 13, 2015
12:30 PM
California
C23: Using the Arabidopsis Information Portal Jason Miller
GMOD
Wednesday, January 14, 2015
11:30 AM
Golden West
W410: JBrowse within the Arabidopsis Information Portal Vivek Krishnakumar
Poster Session – Even
Monday, January 12, 2015
10:00 AM-11:30 AM
Grand Exhibit Hall
P0790: Data Integration for the Plant Research Community: Araport
P0792: Developing Content for the Arabidopsis Information Portal
Chia-Yi Cheng
Matt Vaughn

More Related Content

What's hot

Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
Araport
 
ICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris Town
Araport
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
Matthew Vaughn
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
Araport
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
Araport
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
Araport
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Matthew Vaughn
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
Alejandra Gonzalez-Beltran
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
Raul Palma
 
Data integration
Data integrationData integration
Data integration
Rafael C. Jimenez
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
FAIRDOM
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
Stuart Chalk
 
Data integration
Data integrationData integration
Data integration
Rafael C. Jimenez
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
Ramy K. Aziz
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
François Belleau
 
i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017
Monica Poelchau
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
Ramy K. Aziz
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
Phoenix Bioinformatics
 

What's hot (20)

Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
 
ICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris Town
 
Developing Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through AraportDeveloping Apps: Exposing Your Data Through Araport
Developing Apps: Exposing Your Data Through Araport
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open DataArabidopsis Information Portal: A Community-Extensible Platform for Open Data
Arabidopsis Information Portal: A Community-Extensible Platform for Open Data
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
 
Data integration
Data integrationData integration
Data integration
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Data integration
Data integrationData integration
Data integration
 
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen...
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017
 
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree..."The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergree...
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
 

Similar to Tripal within the Arabidopsis Information Portal - PAG XXIII

FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014
Matthew Vaughn
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Matthew Vaughn
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
Globus
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
JasonRafeMiller
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
Ian Duncan
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdf
RichardWallis3
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go Live
Richard Wallis
 
How e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm DataHow e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm DataStoitsis Giannis
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Bradford Condon
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Artefactual Systems - AtoM
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 

Similar to Tripal within the Arabidopsis Information Portal - PAG XXIII (20)

FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
aip_developer_overview_icar_2014
aip_developer_overview_icar_2014aip_developer_overview_icar_2014
aip_developer_overview_icar_2014
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdf
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go Live
 
How e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm DataHow e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm Data
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 

More from Vivek Krishnakumar

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
Vivek Krishnakumar
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
Vivek Krishnakumar
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
Vivek Krishnakumar
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Vivek Krishnakumar
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
Vivek Krishnakumar
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
Vivek Krishnakumar
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
Vivek Krishnakumar
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Vivek Krishnakumar
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
Vivek Krishnakumar
 

More from Vivek Krishnakumar (9)

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 

Recently uploaded

Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 

Recently uploaded (20)

Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 

Tripal within the Arabidopsis Information Portal - PAG XXIII

  • 1. araport.org@araport Tripal within the Arabidopsis Information Portal Vivek Krishnakumar J. Craig Venter Institute 12/11/2015 Tripal Database Network and Initiatives PAG XXIII, San Diego, CA
  • 2. araport.org@araport Overview •  About Araport •  Current architecture •  Planned implementation – Leverage Chado schema – Accommodate inherited data – Serve as point of integration – Facilitate data sharing via web services
  • 3. araport.org@araport About Araport •  Objectives –  Develop community web interface •  sustainable, fundable and community-extensible •  hosts analysis modules, visualization tools, user data spaces –  Practice data federation •  integrate diverse data sets from distributed sources •  consume and expose data via RESTful web services –  Maintain “gold standard” Col-0 annotation •  assemble tissue-specific transcripts from publicly available RNA-seq datasets •  incorporate novel coding and non-coding genes
  • 4. araport.org@araport Araport https://www.araport.org •  Explore data •  ThaleMine •  JBrowse •  Science Apps •  Search data •  Quick Search •  BLAST •  Raw data downloads •  Community •  News & Events •  Ask a question •  Job Postings •  Useful Links
  • 5. araport.org@araport Araport Architecture External programsPortal (www.araport.org) API (api.araport.org) Agave Core meta data user profile ADAMA service manage service enroll a b c d e f CGI Computing Storage Databases ThaleMine JBrowse Authentication, metering, logging, versioning, HTTPS, CORS a b c d e f Apps Jobs Systems CGI InterMine Others Tripal SOAP CGI REST Science Apps
  • 6. araport.org@araport Current implementation Araport data mart Combination of flat-files and databases •  TAIR datasets •  Ontologies (GO, PSI) •  Interactions (BAR) •  Orthologs (Panther) Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically Outputs •  ThaleMine WebApp •  ThaleMine web services publish Araport warehouse Web services InterMine loader live calls to… •  UniProt web services •  PubMed web services publish
  • 7. araport.org@araport Planned implementation Araport warehouse Araport data mart Warehouse •  Chado schema, PostgreSQL DB •  General purpose but slow •  Permanent host for core genomic datasets (assembly, annotation, metadata, etc.) Inputs •  Genome annotation pipeline •  Community curation data Outputs •  ThaleMine WebApp •  ThaleMine web services publish Data Mart •  InterMine schema, PostgreSQL DB •  Indexed and flattened for speed •  Rebuilt periodically
  • 8. araport.org@araport •  Functions as our low-level (core) Araport data warehouse –  Preserve legacy datasets with appropriate attributions –  Track any new datasets generated (annotation updates, community contributions) –  Serve as point of integration and de-duplication of certain data types –  Integrate with planned community curation interface •  Supports our pursuit of being open-source (and future-proof) http://gmod.org/wiki/Chado
  • 9. araport.org@araport •  Drupal CMS based modularized framework, exposing a user-friendly interface to Chado – provides standardized loaders for genomic datasets (FASTA, GFF3, GenBank, BLAST, GO, InterProScan, KEGG) – supports building custom templates and materialized views – exposes well documented API http://tripal.info
  • 10. araport.org@araport Integrate data inherited from TAIR •  Currently a combination of flat-files and TAIR’s Oracle database –  Genome Assembly (TAIR9) –  Genome Annotation (TAIR10): genes, pseudogenes, transposons, ncRNAs –  Annotation properties: gene symbols, confidence ranking, functional descriptions, curator summary –  GO Annotations (TAIR curated data at geneontology.org) –  Publications (curated gene à publication relationships) –  Variation data: Genetic markers, Polymorphisms (SNPs, TILLing) and T- DNA Insertions –  Stock data (lines, clones, germplasm) •  Chado backed Tripal will serve as the core repository for this data
  • 11. araport.org@araport Integrate with planned Community Curation Interface
  • 12. araport.org@araport Integrate publication data •  Existing sources for publication data –  TAIR locus to PubMed ID mapping –  NCBI gene2pubmed mapping –  UniProt curated Protein to PubMed ID mapping –  Publications missing PMIDs and/or DOIs •  Chado will act as point of integration –  Combine and de-duplicate publication data from 3 sources (more in the future) –  Collect and store metadata for publications with and without PMID and/or DOIs
  • 13. araport.org@araport Integrate Stock data •  TAIR stock related tables mapped to corresponding Chado counterpart •  Custom loaders developed to perform bulk update of Stock information, Phenotypes, Polymorphism data and mappings to AGI locus
  • 14. araport.org@araport Role of Tripal within Araport •  Tripal is under active development, with plans in place to begin developing rational web services (WS) as well as support interoperability •  Araport plans to be involved in this working group to satisfy the following needs of our project: –  Expose live data from future annotation update pipelines to the community directly via WS –  Expose stock data via WS in a standardized manner to Arabidopsis stock centers (both ABRC and NASC) to aid data synchronization –  Embrace and support other open-source initiatives
  • 15. araport.org@araport Araport on GitHub •  GitHub organization: https://www.github.com/Arabidopsis-Information-Portal •  Relevant repositories: –  tair-chado-batchflow –  chado_pub_loader –  pasa-chado-hook –  GMOD/Apollo (fork)
  • 16. araport.org@araport Acknowledgements •  JCVI Developers –  Maria Kim –  Irina Belyaeva –  Svetlana Karamycheva •  Tripal co-PI Stephen Ficklin and development community •  TAIR/Phoenix Bio: assistance with data migration •  Funding Agencies
  • 17. araport.org@araport Chris Town, PI Lisa McDonald Education and Outreach Coordinator Chris Nelson Project Manager Jason Miller, Co-PI JCVI Technical Lead Erik Ferlanti Software Engineer Vivek Krishnakumar Bioinf. Engineer Svetlana Karamycheva Bioinf Engineer Eva Huala Project lead, TAIR Bob Muller Technical lead, TAIR Gos Micklem, co-PI Sergio Contrino Software Engineer Matt Vaughn co-PI Steve Mock Advanced Computing Interfaces Rion Dooley, Web and Cloud Services Matt Hanlon, Web and Mobile Applications Maria Kim Bioinf Engineer Ben Rosen Bioinf Analyst Joe Stubbs, API Developer Platform Walter Moreira API Developer Federation Chris Jordan Database Manager Eleanor Pence Intern Chia-Yi Cheng Bioinf Analyst Seth Schobel Bioinf. Engineer Araport Team Irina Belyaeva Software Engineer
  • 19. araport.org@araport Araport @ PAG XXIII Session Details Topic(s) Presenter(s) Tripal Database Network and Initiatives Sunday, January 11, 2015 5:30 PM-5:45 PM California W876: Tripal within the Arabidopsis Information Portal Vivek Krishnakumar Arabidopsis Information Portal & IAIC Workshop Monday, January 12, 2015 12:50 PM-3:00 PM Pacific Salon 6-7 (2nd Floor) W059: Walkthrough the Araport Web Site W061: Exposing Web Services for Araport W062: Developing applications for Araport Chia-Yi Cheng Jason Miller Matt Vaughn Computer Demo 2 Tuesday, January 13, 2015 12:30 PM California C23: Using the Arabidopsis Information Portal Jason Miller GMOD Wednesday, January 14, 2015 11:30 AM Golden West W410: JBrowse within the Arabidopsis Information Portal Vivek Krishnakumar Poster Session – Even Monday, January 12, 2015 10:00 AM-11:30 AM Grand Exhibit Hall P0790: Data Integration for the Plant Research Community: Araport P0792: Developing Content for the Arabidopsis Information Portal Chia-Yi Cheng Matt Vaughn