SlideShare a Scribd company logo
Building genomic data cyberinfrastructure with the online
database software Tripal and analysis workflows driven by
Galaxy
Meg Staton
University of Tennessee, Knoxville
mstaton1@utk.edu
@hardwoodgenomics
Cyberinfrastructure
Need to connect people to
• Computing systems
• Data storage systems
• Advanced instruments
• Data repositories
• Visualization environments
• Sensors
All distributed across the world
Wilkinson et al 2016
FAIR data principles
Findable
• Unique and persistent identifiers
Accessible
• Open and free method for retrieval
Interoperable
• Data are properly associated with other datasets
Re-usable
• Rich metadata (attributes for who, what, when, where, how)
The community (genome) database
Mission
• Collect data
• Curate data
• Integrate data
• Provide access to data
Difference from primary repositories
Why do we need community databases?
The “Community” Part
• Understand what is important for your users
• Respond to questions
• Attend community meetings
• Participate in grants
• Take data that doesn’t have a home anywhere else
• Manual curation
Challenges
• 2007, Clemson University
• We were writing all the database
and web code from scratch
• Starting to accumulate multiple
databases
• Would like to focus on biological
visualization, instead cobbling
together code modules to handle
• Usernames/passwords/permissions
• Front page news items
• Calendar of meetings
• There has to be an easier way!
Dorrie Main Stephen Ficklin
A web framework for genetic and genomic data
Goals:
• Simplify construction of a community genomics
websites
• Encourage high-quality, standards-based websites
for data sharing and collaboration
• Expand and reuse code
http://tripal.info
Content Management
System
Website construction toolkit
Open source
Globally utilized and
supported
Manages users
Module-based design
My Drupal Web Site
Calendar Module
Views
XML Sitemap
My Drupal Web Site
Calendar Module
Views
Organism
Sequence Feature
GenotypeDrupal Database
Why use Tripal?
Goals:
• Simpler construction
• Encourage high-quality, standards-based websites
for data sharing and collaboration
• Expand and reuse code
Open source
Friendly developers
Responsive mailing list
Modules
Core Modules
• Organisms
• Contact
• Controlled Vocabularies
• Stocks/Germplasm
• Phenotypes
• Genotypes
• Features
• Phenotypes
• Phylogenies
Bulk Data Loader
Jobs Management
Extension Modules
• Transcriptomes
• Functional annotation:
• BLAST
• KEGG
• InterPro
• GeneOntology
• BLAST server
• Breeding API
• Genetic Maps
• Libraries
• JBrowse
Elasticsearch
Extension Module
https://github.com/tripal/tripal_elasticsearch
What problem is being solved?
• Drupal internal search
• Easy to set up and customize (for normal Drupal data types)
• Slow to index, slow to return results
• Need a solution that will:
• Access Chado database
• Provide flexible and customizable indexing – index only what is
needed, not everything
• Scale to very large biological data sets
Elasticsearch Software
Distributed, open source search and analytics engine
• Massively distributed – can scale horizontally
• Multitenancy – a search cluster can manage many
individual indices that can be queried individually or as a
group
• Feature-rich - autocomplete, fuzzy searching, “did you
mean” suggestions
• Open source
• Widely adopted
Elasticsearch Module Implementation
Install
Elasticsearch
Install Tripal
Elasticsearch
Module
Connect to the
Elasticsearch
Server
Index Drupal
nodes
Site-wide
search
Index targeted
Chado or
Drupal tables
Customized
search
Index chado table or materialized view
After indexing, build search block
The block is a normal Tripal
block that can be placed on
any or all pages.
Blocks can also be deleted
from the admin back end.
Alter form fields
Final Custom Transcript Search
Final Custom Transcript Search - Results
Elasticsearch Module
Faster indexing (if only due to multicore usage)
Faster searching
Future Development
• Multisite installs on a single web server – currently
working
• Port to Tripal 3.0
• Compare to new internal searching
Analysis Expression
Extension Module
https://github.com/tripal/tripal_analysis_expression
What problem is being solved?
Biological
Samples
RNA Libraries Gene Expression
Levels
Need a better way to store and visualize RNASeq differential gene
expression experiments.
Expression Module – Content Types
• Biomaterial
• Similar to NCBI BioSample and SRA
• We currently do not differentiate between samples and
libraries
• Expression Analysis
• User specifies protocol and array design if a microarray
was used
• Upload and display of gene expression values
Loading Data
• Import biomaterial
• BioSample data downloaded from NCBI (xml)
• Flat file format (based on NCBI biomaterial bulk load
form)
• Can associate ontology terms through flat file
• Create a new expression analysis
• Import expression values as text files
• (assumed to be normalized, features must already
exist)
• Individual file per sample
• Tab delimited file with gene rows, sample columns
Visualization - Biomaterials
Visualization – Gene Expression
Visualization – Gene Expression
Hover over a library name for
a description
Some options to alter the
graphic
Expression Visualization Tool
• Paste a list of genes in to get a full heatmap across all
libraries.
• Plotly allows you to zoom, download, etc.
Future Work on Expression Module
• Transfer the list of all features from search results to
expression visualization tool
• Add significance/p-values from differential gene
expression test results
• Aid searching – limit results only to genes that respond to cold
stress
• Interactive data filtering
• Tie into analysis engine
• Tie into a publication module
Galaxy
Extension Module and Analysis Engine
https://github.com/tripal/tripal_galaxy
Galaxy is an open, web-based platform for accessible,
reproducible, and transparent computational biomedical
research.
No need to use the command line to run NGS pipelines.
Use a website to upload data, build an analysis pipeline
and run it.
Tripal Galaxy Module
• Currently under development
• https://github.com/tripal/tripal_galaxy
• Tripal sites can provide Galaxy workflows to their users
• Ensures reproducibility of data analysis steps
• Decreases curator effort/time
• Provides the workflow within the look-and-feel of the site
• Can be installed by any Tripal site once completed.
Galaxy Workflows
Testing on Galaxy instances at Washington State
University, University of Connecticut, and University of
Tennessee
DNA Sequence Data
• Re-sequencing
alignment
• Variant discovery
(against the reference)
• Variant discovery
(between samples)
• Prediction of functional
genetic variants
https://github.com/statonlab/dibbs
Tripal Galaxy
• Expected release in April 2016 for first workflow on HWG
• Galaxy backend will be running at WSU
• Need to continue work on
• Selecting and filtering data to input to a workflow
• Monitoring workflow status
• Receiving meaningful error messages if problems occur
Going Mobile
Users produce messy data
Day Collector Color Diseased?
11-14-16 Evan Red 0
11-14-16 Evan Pink 0
11-14-16 Evan White 1
Nov 14 2016 Becky Fuschia True
Nov 14 2016 Becky White False
16-11-14 Miriam Vermillion Yes
Standardize Collection
• Create forms for data collection
• Serve through a flexible mobile app
• Currently prototyping as a citizen science app
Mobile App
• Timeline
• Citizen Science app released
by July 2017
• Prototype of full phenotyping
app by Jan 2018
• Testing in multiple systems
Cyberinfrastructure
Access data
Find data
Visualize dataAnalyze data
Collect data
Abdullah Almsaeed Bradford Condon Miriam Paya Milans
Research Associate Postdoc Postdoc
Ming Chen Fang Lui
Graduate Student Graduate Student
Stephen Ficklin
Dorrie Main
Jill Wegrzyn
Bert Abbott
Dana Nelson
Ellen Crocker
Building genomic data cyberinfrastructure with the online database software Tripal and analysis workflows driven by Galaxy

More Related Content

What's hot

Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
Araport
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
Stuart Chalk
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Arakno
AraknoArakno
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Peter van Heusden
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
bdemchak
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
Paul Groth
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
Charlie Hull
 
Cytoscape basic features
Cytoscape basic featuresCytoscape basic features
Cytoscape basic features
Luay AL-Assadi
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
ericmeeks
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
National Information Standards Organization (NISO)
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Leighton Pritchard
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 

What's hot (20)

Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Arakno
AraknoArakno
Arakno
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Cytoscape basic features
Cytoscape basic featuresCytoscape basic features
Cytoscape basic features
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 

Viewers also liked

Chestnut Resources via Hardwood Genomics Web
Chestnut Resources via Hardwood Genomics WebChestnut Resources via Hardwood Genomics Web
Chestnut Resources via Hardwood Genomics Web
mestato
 
Decorex Durban 2017 in pictures
Decorex Durban 2017 in picturesDecorex Durban 2017 in pictures
Decorex Durban 2017 in pictures
Fred Felton
 
Perché le aziende devono essere presenti su internet
Perché le aziende devono essere presenti su internetPerché le aziende devono essere presenti su internet
Perché le aziende devono essere presenti su internet
Clientecontento
 
New Declassified Report Exposes Hamas Human Shield Policy
New Declassified Report Exposes Hamas Human Shield PolicyNew Declassified Report Exposes Hamas Human Shield Policy
New Declassified Report Exposes Hamas Human Shield Policy
IsraelDefenseForces
 
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
Péter Tóth-Czere
 
Propuestas de resolución 2017
Propuestas de resolución 2017Propuestas de resolución 2017
Propuestas de resolución 2017
Nueva Canarias-BC
 
Sportcongres noord nederland Jaap van Zessen
Sportcongres noord nederland Jaap van ZessenSportcongres noord nederland Jaap van Zessen
Sportcongres noord nederland Jaap van Zessen
Jaap van Zessen
 
Presentación
PresentaciónPresentación
Presentación
Jose Mateos
 
Style Framework - SXSW2015
Style Framework - SXSW2015Style Framework - SXSW2015
Style Framework - SXSW2015
Marti Gold
 
Webinar - Introducción a la ISO/IEC 29110-4-1:2011
Webinar - Introducción a la ISO/IEC 29110-4-1:2011Webinar - Introducción a la ISO/IEC 29110-4-1:2011
Webinar - Introducción a la ISO/IEC 29110-4-1:2011
jpalma200680
 
Regalos del Chavez a otros Paises
Regalos del Chavez a otros PaisesRegalos del Chavez a otros Paises
Regalos del Chavez a otros Paises
Paraulata Ilustrada
 
Muallim ul quran revised
Muallim ul quran revisedMuallim ul quran revised
Muallim ul quran revisedSikander Ghunio
 
Money Laundering Law Germany
Money Laundering Law GermanyMoney Laundering Law Germany
Money Laundering Law Germany
Lutz Hartmann
 
Don't Believe Trump's Hype: Regulations do Work for Business
Don't Believe Trump's Hype: Regulations do Work for BusinessDon't Believe Trump's Hype: Regulations do Work for Business
Don't Believe Trump's Hype: Regulations do Work for Business
American Sustainable Business Council
 
Foro activación del empleo 2017. 29 30 MARZO IFEMA
Foro activación del empleo 2017. 29 30 MARZO IFEMAForo activación del empleo 2017. 29 30 MARZO IFEMA
Foro activación del empleo 2017. 29 30 MARZO IFEMA
YOLANDA ROSCO GARCINUÑO
 
AngularJS - podstawy
AngularJS - podstawyAngularJS - podstawy
AngularJS - podstawy
Apptension
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
 
博進空手部・学費免除の裏技
博進空手部・学費免除の裏技博進空手部・学費免除の裏技
博進空手部・学費免除の裏技
al_qrantz
 
IT技術者こそ覚えておきたい脳梗塞の症状
IT技術者こそ覚えておきたい脳梗塞の症状IT技術者こそ覚えておきたい脳梗塞の症状
IT技術者こそ覚えておきたい脳梗塞の症状
なおき おさだ
 
Collective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routingCollective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routing
Kolja Kleineberg
 

Viewers also liked (20)

Chestnut Resources via Hardwood Genomics Web
Chestnut Resources via Hardwood Genomics WebChestnut Resources via Hardwood Genomics Web
Chestnut Resources via Hardwood Genomics Web
 
Decorex Durban 2017 in pictures
Decorex Durban 2017 in picturesDecorex Durban 2017 in pictures
Decorex Durban 2017 in pictures
 
Perché le aziende devono essere presenti su internet
Perché le aziende devono essere presenti su internetPerché le aziende devono essere presenti su internet
Perché le aziende devono essere presenti su internet
 
New Declassified Report Exposes Hamas Human Shield Policy
New Declassified Report Exposes Hamas Human Shield PolicyNew Declassified Report Exposes Hamas Human Shield Policy
New Declassified Report Exposes Hamas Human Shield Policy
 
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
Brand Storytelling - Miért használj a tartalomterjesztéshez fizetett hirdetés...
 
Propuestas de resolución 2017
Propuestas de resolución 2017Propuestas de resolución 2017
Propuestas de resolución 2017
 
Sportcongres noord nederland Jaap van Zessen
Sportcongres noord nederland Jaap van ZessenSportcongres noord nederland Jaap van Zessen
Sportcongres noord nederland Jaap van Zessen
 
Presentación
PresentaciónPresentación
Presentación
 
Style Framework - SXSW2015
Style Framework - SXSW2015Style Framework - SXSW2015
Style Framework - SXSW2015
 
Webinar - Introducción a la ISO/IEC 29110-4-1:2011
Webinar - Introducción a la ISO/IEC 29110-4-1:2011Webinar - Introducción a la ISO/IEC 29110-4-1:2011
Webinar - Introducción a la ISO/IEC 29110-4-1:2011
 
Regalos del Chavez a otros Paises
Regalos del Chavez a otros PaisesRegalos del Chavez a otros Paises
Regalos del Chavez a otros Paises
 
Muallim ul quran revised
Muallim ul quran revisedMuallim ul quran revised
Muallim ul quran revised
 
Money Laundering Law Germany
Money Laundering Law GermanyMoney Laundering Law Germany
Money Laundering Law Germany
 
Don't Believe Trump's Hype: Regulations do Work for Business
Don't Believe Trump's Hype: Regulations do Work for BusinessDon't Believe Trump's Hype: Regulations do Work for Business
Don't Believe Trump's Hype: Regulations do Work for Business
 
Foro activación del empleo 2017. 29 30 MARZO IFEMA
Foro activación del empleo 2017. 29 30 MARZO IFEMAForo activación del empleo 2017. 29 30 MARZO IFEMA
Foro activación del empleo 2017. 29 30 MARZO IFEMA
 
AngularJS - podstawy
AngularJS - podstawyAngularJS - podstawy
AngularJS - podstawy
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
博進空手部・学費免除の裏技
博進空手部・学費免除の裏技博進空手部・学費免除の裏技
博進空手部・学費免除の裏技
 
IT技術者こそ覚えておきたい脳梗塞の症状
IT技術者こそ覚えておきたい脳梗塞の症状IT技術者こそ覚えておきたい脳梗塞の症状
IT技術者こそ覚えておきたい脳梗塞の症状
 
Collective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routingCollective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routing
 

Similar to Building genomic data cyberinfrastructure with the online database software Tripal and analysis workflows driven by Galaxy

Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Bradford Condon
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Yannick Pouliot
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
Globus
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
Ken Karapetyan
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Amazon Web Services
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
Uri Laserson
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
DuraSpace
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
Pierre Larmande
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
William Smith
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
Tanu Malik
 

Similar to Building genomic data cyberinfrastructure with the online database software Tripal and analysis workflows driven by Galaxy (20)

Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 

Recently uploaded

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 

Recently uploaded (20)

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 

Building genomic data cyberinfrastructure with the online database software Tripal and analysis workflows driven by Galaxy

  • 1. Building genomic data cyberinfrastructure with the online database software Tripal and analysis workflows driven by Galaxy Meg Staton University of Tennessee, Knoxville mstaton1@utk.edu @hardwoodgenomics
  • 2. Cyberinfrastructure Need to connect people to • Computing systems • Data storage systems • Advanced instruments • Data repositories • Visualization environments • Sensors All distributed across the world
  • 4. FAIR data principles Findable • Unique and persistent identifiers Accessible • Open and free method for retrieval Interoperable • Data are properly associated with other datasets Re-usable • Rich metadata (attributes for who, what, when, where, how)
  • 5. The community (genome) database Mission • Collect data • Curate data • Integrate data • Provide access to data
  • 6. Difference from primary repositories Why do we need community databases? The “Community” Part • Understand what is important for your users • Respond to questions • Attend community meetings • Participate in grants • Take data that doesn’t have a home anywhere else • Manual curation
  • 7. Challenges • 2007, Clemson University • We were writing all the database and web code from scratch • Starting to accumulate multiple databases • Would like to focus on biological visualization, instead cobbling together code modules to handle • Usernames/passwords/permissions • Front page news items • Calendar of meetings • There has to be an easier way! Dorrie Main Stephen Ficklin
  • 8. A web framework for genetic and genomic data Goals: • Simplify construction of a community genomics websites • Encourage high-quality, standards-based websites for data sharing and collaboration • Expand and reuse code http://tripal.info
  • 9. Content Management System Website construction toolkit Open source Globally utilized and supported Manages users Module-based design My Drupal Web Site Calendar Module Views XML Sitemap
  • 10. My Drupal Web Site Calendar Module Views Organism Sequence Feature GenotypeDrupal Database
  • 11. Why use Tripal? Goals: • Simpler construction • Encourage high-quality, standards-based websites for data sharing and collaboration • Expand and reuse code Open source Friendly developers Responsive mailing list
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Modules Core Modules • Organisms • Contact • Controlled Vocabularies • Stocks/Germplasm • Phenotypes • Genotypes • Features • Phenotypes • Phylogenies Bulk Data Loader Jobs Management Extension Modules • Transcriptomes • Functional annotation: • BLAST • KEGG • InterPro • GeneOntology • BLAST server • Breeding API • Genetic Maps • Libraries • JBrowse
  • 20. What problem is being solved? • Drupal internal search • Easy to set up and customize (for normal Drupal data types) • Slow to index, slow to return results • Need a solution that will: • Access Chado database • Provide flexible and customizable indexing – index only what is needed, not everything • Scale to very large biological data sets
  • 21. Elasticsearch Software Distributed, open source search and analytics engine • Massively distributed – can scale horizontally • Multitenancy – a search cluster can manage many individual indices that can be queried individually or as a group • Feature-rich - autocomplete, fuzzy searching, “did you mean” suggestions • Open source • Widely adopted
  • 22. Elasticsearch Module Implementation Install Elasticsearch Install Tripal Elasticsearch Module Connect to the Elasticsearch Server Index Drupal nodes Site-wide search Index targeted Chado or Drupal tables Customized search
  • 23. Index chado table or materialized view
  • 24. After indexing, build search block The block is a normal Tripal block that can be placed on any or all pages. Blocks can also be deleted from the admin back end.
  • 27. Final Custom Transcript Search - Results
  • 28. Elasticsearch Module Faster indexing (if only due to multicore usage) Faster searching Future Development • Multisite installs on a single web server – currently working • Port to Tripal 3.0 • Compare to new internal searching
  • 30. What problem is being solved? Biological Samples RNA Libraries Gene Expression Levels Need a better way to store and visualize RNASeq differential gene expression experiments.
  • 31. Expression Module – Content Types • Biomaterial • Similar to NCBI BioSample and SRA • We currently do not differentiate between samples and libraries • Expression Analysis • User specifies protocol and array design if a microarray was used • Upload and display of gene expression values
  • 32. Loading Data • Import biomaterial • BioSample data downloaded from NCBI (xml) • Flat file format (based on NCBI biomaterial bulk load form) • Can associate ontology terms through flat file • Create a new expression analysis • Import expression values as text files • (assumed to be normalized, features must already exist) • Individual file per sample • Tab delimited file with gene rows, sample columns
  • 35. Visualization – Gene Expression Hover over a library name for a description Some options to alter the graphic
  • 36. Expression Visualization Tool • Paste a list of genes in to get a full heatmap across all libraries. • Plotly allows you to zoom, download, etc.
  • 37. Future Work on Expression Module • Transfer the list of all features from search results to expression visualization tool • Add significance/p-values from differential gene expression test results • Aid searching – limit results only to genes that respond to cold stress • Interactive data filtering • Tie into analysis engine • Tie into a publication module
  • 38. Galaxy Extension Module and Analysis Engine https://github.com/tripal/tripal_galaxy
  • 39. Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research. No need to use the command line to run NGS pipelines. Use a website to upload data, build an analysis pipeline and run it.
  • 40.
  • 41. Tripal Galaxy Module • Currently under development • https://github.com/tripal/tripal_galaxy • Tripal sites can provide Galaxy workflows to their users • Ensures reproducibility of data analysis steps • Decreases curator effort/time • Provides the workflow within the look-and-feel of the site • Can be installed by any Tripal site once completed.
  • 42.
  • 43. Galaxy Workflows Testing on Galaxy instances at Washington State University, University of Connecticut, and University of Tennessee DNA Sequence Data • Re-sequencing alignment • Variant discovery (against the reference) • Variant discovery (between samples) • Prediction of functional genetic variants https://github.com/statonlab/dibbs
  • 44. Tripal Galaxy • Expected release in April 2016 for first workflow on HWG • Galaxy backend will be running at WSU • Need to continue work on • Selecting and filtering data to input to a workflow • Monitoring workflow status • Receiving meaningful error messages if problems occur
  • 46. Users produce messy data Day Collector Color Diseased? 11-14-16 Evan Red 0 11-14-16 Evan Pink 0 11-14-16 Evan White 1 Nov 14 2016 Becky Fuschia True Nov 14 2016 Becky White False 16-11-14 Miriam Vermillion Yes
  • 47. Standardize Collection • Create forms for data collection • Serve through a flexible mobile app • Currently prototyping as a citizen science app
  • 48.
  • 49. Mobile App • Timeline • Citizen Science app released by July 2017 • Prototype of full phenotyping app by Jan 2018 • Testing in multiple systems
  • 51. Abdullah Almsaeed Bradford Condon Miriam Paya Milans Research Associate Postdoc Postdoc Ming Chen Fang Lui Graduate Student Graduate Student Stephen Ficklin Dorrie Main Jill Wegrzyn Bert Abbott Dana Nelson Ellen Crocker