Talk given at "Cloud Computing for Systems Biology" workshop
Upcoming SlideShare
Loading in...5
×
 

Talk given at "Cloud Computing for Systems Biology" workshop

on

  • 4,490 views

 

Statistics

Views

Total Views
4,490
Views on SlideShare
4,447
Embed Views
43

Actions

Likes
2
Downloads
92
Comments
0

5 Embeds 43

http://mndoci.com 22
http://www.slideshare.net 17
http://deepaksingh.net 2
http://mndoci.github.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Talk given at "Cloud Computing for Systems Biology" workshop Talk given at "Cloud Computing for Systems Biology" workshop Presentation Transcript

  • The  role  of  cloud  compu.ng  in  big  biology Deepak  Singh
  • Via Reavel under a CC-BY-NC-ND license
  • life science industry
  • Credit: Bosco Ho
  • By ~Prescott under a CC-BY-NC license
  • context
  • analysis methods
  • technology
  • ? ? technology ? ?
  • back of the room
  • technology technology technology technology
  • technology tec y hn o log olo hn gy c te technology technology y nolog tech gy nolo technology tech
  • Image: Keith Allison under a CC-BY-SA license
  • inherent characteristics
  • data driven
  • multi-dimensional
  • collaborative
  • distributed
  • <amazon web services>
  • the cloud
  • has_many :definitions
  • infrastructure as a service
  • precursors
  • virtualization
  • service oriented architecure
  • distributed computing
  • Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Isolated Networks Monitoring Management Tools Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for Eclipse Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Your Custom Applications and Services Isolated Networks Monitoring Management Tools Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for Eclipse Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • scalable
  • scalable cost effective
  • go o u s y scalable ay a P cost effective
  • scalable cost effective reliable
  • scalable cost effective reliable secure
  • Amazon EC2
  • servers on demand
  • highly scalable
  • 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  • design for failure
  • “Everything fails, all the time” -- Werner Vogels
  • assume failure
  • assume failure design backwards
  • assume failure design backwards nothing fails
  • highly available systems
  • elastic block store
  • elastic IP
  • SQS
  • US East Region Availability Availability Zone A Zone B Availability Availability Zone C Zone D
  • data storage
  • one size does not fit all
  • Amazon S3
  • distributed object store
  • durable
  • available
  • !"#$%&'()*+ T T T
  • scalable
  • fast
  • simple
  • structured data anyone?
  • Amazon SimpleDB
  • zero administration
  • highly available
  • schema less
  • key-value store
  • Amazon Relational Data Service
  • single API call
  • MySQL database
  • automatic backup
  • scale up with API call
  • e s ur t fu
  • e s ur t fu master-slave replication data center failover
  • what do people do?
  • solve problems
  • > 1PB of data in S3
  • provide platforms & services
  • Platform as a Service http://heroku.com
  • Computation as a Service http://cyclecomputing.com
  • Computational Platforms sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  • http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  • they do science Image: Matt Wood
  • 3.7 million classifications in just over three days ~15 million in less than a month >2.6 million clicks in 100 hours
  • Image  via  image  editor  under  a  CC-­‐BY  License
  • Protein Docking @ Pfizer http://bioteam.net
  • http://aws.amazon.com/publicdatasets/
  • </amazon web services>
  • anecdote
  • collaborative project
  • 800 GB
  • Image: Wikipedia Commons
  • weeks to get started
  • Image: Matt Wood
  • Image: Chris Dagdigian
  • gigabytes
  • terabytes
  • petabytes
  • really fast
  • constant flux
  • Image: Chris Dagdigian
  • data management is not data storage
  • masterclass Big data & Biology: The implications of petascale science Tuesday November 17 1:30PM - 3:00PM Room: PB253-254-257-258
  • “science data platform”
  • deliver data to applications
  • deliver data to people
  • typical informatics workflow
  • Via Christolakis under a CC-BY-NC-ND license
  • Via Argonne National Labs under a CC-BY-SA license
  • p p r a il le k Via Argonne National Labs under a CC-BY-SA license
  • Da ta Ap ps
  • Data Platform App Platform
  • Data Platform App Platform
  • Data Platform App Platform
  • Data Platform data services
  • application services App Platform
  • Scalable Data Platform Services APIs Getters Filters Savers WORK
  • must accommodate change
  • must scale
  • highly available
  • loosely coupled
  • dynamic
  • task-based resources
  • one project one set of resources
  • no waiting
  • Protein Docking @ Pfizer http://bioteam.net
  • distributed mindset
  • one approach
  • disk read/writes slow & expensive
  • data processing fast & cheap
  • distribute data parallelize reads
  • map/reduce
  • distributed data processing at scale
  • abstracting away hadoop
  • apache hive http://hadoop.apache.org/hive/
  • apache pig http://hadoop.apache.org/pig/
  • cascading http://www.cascading.org/
  • hosted hadoop service
  • hadoop easy & simple
  • Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  • developers develop & distribute
  • scientists/analysts consume
  • CloudBurst Catalog k-mers Collect seeds End-to-end alignment
  • Mike Schatz, University of Maryland
  • Scalable Data Platform Services APIs Getters Filters Savers WORK
  • IN CONCLUSION
  • large scale biology
  • complex multidimensional data
  • whole lot of data
  • distributed collaborations
  • new computing and data architectures
  • a solution: cloud services
  • distributed
  • scalable
  • economical
  • here today
  • Thank  you! deesingh@amazon.com  Twi<er:@mndoci   Presenta?on  ideas  from  @mza,  James  Hamilton,  and  @lessig