Kallio bosc2010 chipster-cloud

Connecting Chipster genome browser
to the cloud

Aleksi Kallio
CSC – IT Center for Science, Finland

Architecture of Chipster platform
Authentication Management
service service

Message broker

File broker
Clients
Brokers Computing
services

 Loosely coupled, independent components
 Message oriented communications
 Flexible, scalable, robust
 In other words, very cloud like

Chipster in the cloud

 1) Deploying compute nodes in the cloud
• Easy, because architecture already loosely coupled and based
on message passing
 2) Running large parallel jobs in the cloud
• Architecture allows this easily
• Cloud compatible tools can be integrated quickly
 3) Using cloud as a back end for interactive
visualisations
• Not maybe so obvious
• So let's dig into this further...

Background: Chipster Genome Browser

 Interactive Swing-based GUI
 Shows reads and analysis results in genomic context
 Interactive zooming from chromosome down to nucleotide level
 Ensembl annotations for genes and transcripts
 Integrated with the rest of the Chipster
 Parallel, distributed to some extent

Basic idea

 Preprocess data with Hadoop / MapReduce
 Generate powers of two summaries for the data, like in
Google Earth
• Doubles the data size
 Current genome browser samples data to produce
summaries
 Now summaries can be read directly
– Accurate results, significantly less disk seeks
 Distribute data to scale into massive datasets
• Use messaging to query independent data providers
 Aggregate results as/if they appear to the visualiser

Work in progress...

 Genome browser up and
running
 Hadoop based data
processing at very early
stages
 Currently trying to get it
scale well

What's the point?

 Besides items (e.g., reads), visualiser can receive
“superitems” (e.g., summaries of reads)
• Summarises coverage, quality, SNP's etc. of the original reads
 All kinds of advanced information can be generated in
the preprocessing step
– Such as features that combine large number of genomes
– Generators should be pluggable
 We spend resources on the server side to improve user
experience on the client side
• At server side CPU, memory and disk space required
• But only for a short time (like in large batch jobs)
• Cheap commodity servers can be used
• And the experiment has already been expensive

Summary

 Use cheap server resources to enable better user
experience
 Goal: to make data analysis quicker (and more fun)
 Tackle server side unreliability on the client side
 Future development
– If this works out, it could be used in other Chipster
visualisers also
– Integrating Hbase queries to interactive visualisations
– Optimising data summarising for visual truthfulness
 For more info: aleksi.kallio@csc.fi,

Kallio bosc2010 chipster-cloud

More Related Content

Viewers also liked

Similar to Kallio bosc2010 chipster-cloud

More from BOSC 2010

Recently uploaded

Kallio bosc2010 chipster-cloud