SlideShare a Scribd company logo
1 of 1
Chantal Roth Bioinformatics Department TMRI, San Diego, CA
Acknowledgements: many people have contributed to this project, including: Jason Wu, Dana Alcivare, Hemant
Varma, E Li, Roman Rozenshteyn, Derek Guist, Don Hutchison, Darrell Ricke and Chris Martin
Abstract:Abstract: BioSphere consists of several components: a data warehouse containing integrated data originating from various sources, an application server that executes most of the business logic, and a client for user interactions and data visualizations. The poster zooms into the various components to show how the
architecture, the software design and the datasets it contains.
Design
3-Tier Architecture
• Java Swing and JSP clients
• Weblogic application server
• Oracle 8i database (9i soon)
GoF and J2EE Design Patterns
• FastLaneReader using Data Access Objects
• Servlet/Session Bean as front controller
• Factories and abstract factories for
persistence layer components
• Code generator pattern
• Visitor, singleton, command, adapter,
delegate, observer and many other design
patterns
Software
Code
• Over 280,000 lines of Java code:
– Client: ca 108,000
– Server: ca 172,000
• 75,000 are automatically generated
• 210,000 lines are manually written.
Code Generator
• creates value objects (java beans)
• data access objects (persistence)
• xml objects (xml parsers and creators)
XML
• Meta data definitions (table graph, db definitions)
• Database and server configuration
• User preferences
• Application module definitions
Major Data Sets
• Rice: myriad contigs, traces (sequences), markers,
FPC’s, gene predictions, proprietary gene models,
motifs, rice affy chip, cDNA’s, various blast
analysis
• Cochliobolus, Fusarium, Botrytis: contigs,
traces, affy chips, blast analysis, gene predictions,
motifs, pfam
• Arabidopsis: entire chromosomes, gene models
(TAIR), GARLIC, markers, affy chip
• Drosophila: genbank, P-elements, various blast
analysis, EST’s
• Other genomes: ashbya, neurospora, human,
banana, c.elegans, phytophthora, barley, maize,
potatoe, rye, sorghum, vinca, wheat, aphid,
heliothis, manduca, helicoverpa armigera,
melincognita, silverleaf whitefly, pombe, yeast,
magnaporthe
• Analysis: various blast analysis such as wheat
EST’s, plant repeats, swissprot, genbank, phytoseq
EST’s, various gene predictions, pfam
Hardware
Application Server
• Weblogic Application Server runs on a Sun E220R
with 1.5 GB RAM and 2 cpus
Database
• Oracle Server: Sun E4500 with 8 cpus and 8BGB
RAM
Analysis
• AnalysisClient: 24 analysis demons running on the
titan cluster (Linux)
Database
• Oracle 8i database (soon upgraded to 9i)
• More than 100 tables
• More than ½ Terabyte of data
• Data warehousing approach adapted to
bioinformatics
Data Integration
Data Sources
• Typically LIMS databases are specialized for one particular
data type, such as sequences or expression
• They usually allow a user to enter detailed information about
the sample (such as tissue)
• These systems are normally not linked and do not allow a
user to ask questions across datatypes
ETL
• The conclusions of the experiments, a very small subset of
the LIMS data, need to be integrated in order to allow
sophisticated data mining and visualizations (colored boxes)
• To do this, the results are extracted, transformed and then
loaded into a data warehouse
• Ideally a middleware would be used that handles the
extraction, transformation and loading
Client PC
EventsEvents
Events
Server Services
Request
Handler
Save
Service
Logging
Manager
Login
Component
Utilities
GUI
Print
Service
Undo
Manager
GUI
Export
Service
GUI
Preferences
Manager
GUI
Import
Service
Frameworks/Components
Graphics
Framework
Threading
Framework
Object
Pool
XML
Framework
Graph
Framework
Applications
X
M
L
X
M
L
X
M
L
X
M
L
X
M
L
Pathfinder
Contig
Viewer
Analysis
Client
Feature
Viewer
Chromosome
Viewer
X
M
L
Pathways
X
M
L
BioSphere
Messenger
X
M
L
Statistics
X
M
L
News
Communication
EJB
Service
Servlet
Service
Direct
Request
Service
XML
Communic.
Service
Application
Registry
Event
Manager
Control
Analysis Layer
Results
Processor
Compute
Host
Daemon
Compute
Host
Monitor
Blastdb
Monitor
Application Layer
Chromosome
Retriever
Contig
Assembler
Retrieve
Service
Save
Service
Feature
Service
Pathway
Service
Analysis
Service
Email
Service
User
Service
Request
Response
Communication
Ejb
Request
Handler
Servlet
Request
Handler
Direct
Request
Handler
XML
Request
Handler
Service Layer
XML
Startup
Service
Application Server
Persistence Layer
Schema
Service
DAL
ObjectsDAL
ObjectsDAL
ObjectsDAL
Objects
Loader
createscreates
Database
Handler
Schema
Service
Factory
Verifier
XML
Database
Manager
RMI
SER
MC
Data
Warehouse
Data Storage
flatfiles
Other
Databases
JDBC
Metabolomic
LIMS
Mapping,
Markers
Sequencing
LIMS
Proteomic
LIMS
Expression
LIMS
ETL (Extract, Transform, Load)
Data Integration
BioSphere Data
Warehouse
Client
PC
Server
BioSphere
Results of experiments
Software Framework
BioSphere is not only a bioinformatics platform, but
also a software development framework that allows the
development of any 3-tier software:
Persistence Layer
• It handles the retrieval, updating and storage of any data
type to a database. For instance, it could handle the
access of customer and product data in an e-commerce
system
Client-Server Communication
• The framework handles and communication to a server,
whether the server is at a remote location or even in the
same virtual machine. It can use RMI, object
serialization (servlets), XML and direct calls
Request processing
• For each request that goes to the server, the framework
automatically creates a new thread so that the client
never freezes (even if the processing takes a long time).
The requests are relayed to the corresponding business
logic component
Other frameworks
• It also includes a graphics framework for drawing
objects, zooming, selection, moving objects and graph
layout optimization.
• A code generator creates value objects, persistent
objects and XML objects

More Related Content

Similar to biosphere_poster

Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Timothy Chen
 

Similar to biosphere_poster (20)

Web_of_Things_2013
Web_of_Things_2013Web_of_Things_2013
Web_of_Things_2013
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
aip-developer-intro_pag2015
aip-developer-intro_pag2015aip-developer-intro_pag2015
aip-developer-intro_pag2015
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
Server Monitoring (Scaling while bootstrapped)
Server Monitoring  (Scaling while bootstrapped)Server Monitoring  (Scaling while bootstrapped)
Server Monitoring (Scaling while bootstrapped)
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
 
Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
 
KASSAHUNSolomonResume
KASSAHUNSolomonResumeKASSAHUNSolomonResume
KASSAHUNSolomonResume
 
Soundarya Reddy Resume
Soundarya Reddy ResumeSoundarya Reddy Resume
Soundarya Reddy Resume
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

biosphere_poster

  • 1. Chantal Roth Bioinformatics Department TMRI, San Diego, CA Acknowledgements: many people have contributed to this project, including: Jason Wu, Dana Alcivare, Hemant Varma, E Li, Roman Rozenshteyn, Derek Guist, Don Hutchison, Darrell Ricke and Chris Martin Abstract:Abstract: BioSphere consists of several components: a data warehouse containing integrated data originating from various sources, an application server that executes most of the business logic, and a client for user interactions and data visualizations. The poster zooms into the various components to show how the architecture, the software design and the datasets it contains. Design 3-Tier Architecture • Java Swing and JSP clients • Weblogic application server • Oracle 8i database (9i soon) GoF and J2EE Design Patterns • FastLaneReader using Data Access Objects • Servlet/Session Bean as front controller • Factories and abstract factories for persistence layer components • Code generator pattern • Visitor, singleton, command, adapter, delegate, observer and many other design patterns Software Code • Over 280,000 lines of Java code: – Client: ca 108,000 – Server: ca 172,000 • 75,000 are automatically generated • 210,000 lines are manually written. Code Generator • creates value objects (java beans) • data access objects (persistence) • xml objects (xml parsers and creators) XML • Meta data definitions (table graph, db definitions) • Database and server configuration • User preferences • Application module definitions Major Data Sets • Rice: myriad contigs, traces (sequences), markers, FPC’s, gene predictions, proprietary gene models, motifs, rice affy chip, cDNA’s, various blast analysis • Cochliobolus, Fusarium, Botrytis: contigs, traces, affy chips, blast analysis, gene predictions, motifs, pfam • Arabidopsis: entire chromosomes, gene models (TAIR), GARLIC, markers, affy chip • Drosophila: genbank, P-elements, various blast analysis, EST’s • Other genomes: ashbya, neurospora, human, banana, c.elegans, phytophthora, barley, maize, potatoe, rye, sorghum, vinca, wheat, aphid, heliothis, manduca, helicoverpa armigera, melincognita, silverleaf whitefly, pombe, yeast, magnaporthe • Analysis: various blast analysis such as wheat EST’s, plant repeats, swissprot, genbank, phytoseq EST’s, various gene predictions, pfam Hardware Application Server • Weblogic Application Server runs on a Sun E220R with 1.5 GB RAM and 2 cpus Database • Oracle Server: Sun E4500 with 8 cpus and 8BGB RAM Analysis • AnalysisClient: 24 analysis demons running on the titan cluster (Linux) Database • Oracle 8i database (soon upgraded to 9i) • More than 100 tables • More than ½ Terabyte of data • Data warehousing approach adapted to bioinformatics Data Integration Data Sources • Typically LIMS databases are specialized for one particular data type, such as sequences or expression • They usually allow a user to enter detailed information about the sample (such as tissue) • These systems are normally not linked and do not allow a user to ask questions across datatypes ETL • The conclusions of the experiments, a very small subset of the LIMS data, need to be integrated in order to allow sophisticated data mining and visualizations (colored boxes) • To do this, the results are extracted, transformed and then loaded into a data warehouse • Ideally a middleware would be used that handles the extraction, transformation and loading Client PC EventsEvents Events Server Services Request Handler Save Service Logging Manager Login Component Utilities GUI Print Service Undo Manager GUI Export Service GUI Preferences Manager GUI Import Service Frameworks/Components Graphics Framework Threading Framework Object Pool XML Framework Graph Framework Applications X M L X M L X M L X M L X M L Pathfinder Contig Viewer Analysis Client Feature Viewer Chromosome Viewer X M L Pathways X M L BioSphere Messenger X M L Statistics X M L News Communication EJB Service Servlet Service Direct Request Service XML Communic. Service Application Registry Event Manager Control Analysis Layer Results Processor Compute Host Daemon Compute Host Monitor Blastdb Monitor Application Layer Chromosome Retriever Contig Assembler Retrieve Service Save Service Feature Service Pathway Service Analysis Service Email Service User Service Request Response Communication Ejb Request Handler Servlet Request Handler Direct Request Handler XML Request Handler Service Layer XML Startup Service Application Server Persistence Layer Schema Service DAL ObjectsDAL ObjectsDAL ObjectsDAL Objects Loader createscreates Database Handler Schema Service Factory Verifier XML Database Manager RMI SER MC Data Warehouse Data Storage flatfiles Other Databases JDBC Metabolomic LIMS Mapping, Markers Sequencing LIMS Proteomic LIMS Expression LIMS ETL (Extract, Transform, Load) Data Integration BioSphere Data Warehouse Client PC Server BioSphere Results of experiments Software Framework BioSphere is not only a bioinformatics platform, but also a software development framework that allows the development of any 3-tier software: Persistence Layer • It handles the retrieval, updating and storage of any data type to a database. For instance, it could handle the access of customer and product data in an e-commerce system Client-Server Communication • The framework handles and communication to a server, whether the server is at a remote location or even in the same virtual machine. It can use RMI, object serialization (servlets), XML and direct calls Request processing • For each request that goes to the server, the framework automatically creates a new thread so that the client never freezes (even if the processing takes a long time). The requests are relayed to the corresponding business logic component Other frameworks • It also includes a graphics framework for drawing objects, zooming, selection, moving objects and graph layout optimization. • A code generator creates value objects, persistent objects and XML objects