SlideShare a Scribd company logo
1 of 324
Download to read offline
Big	
  Data	
  and	
  Biology:	
  The	
  implica4ons	
  of	
  petascale	
  science
Deepak	
  Singh
Via Reavel under a CC-BY-NC-ND license
life science industry
Credit: Bosco Ho
By ~Prescott under a CC-BY-NC license
data
Image: Wikipedia
biology
big data
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Matt Wood
Hu
                     ma
                       ng
                         en
                           om
                             e




Image: Matt Wood
not just sequencing
Image: Ricardipus
more data
Image: Matt Wood
all hell breaks loose
~100 TB/Week
~100 TB/Week



 >2 PB/Year
years
weeks
days
days
mi
    nu
      tes
days        ?
gigabytes
terabytes
petabytes
exabytes?
really fast
Image: http://www.broadinstitute.org/~apleite/photos.html
single lab
Image: Chris Dagdigian
implications of scale
data management
data processing
data sharing
fundamental concepts
1. architecting for scale
“Everything fails, all the time”
                   -- Werner Vogels
“Things will crash. Deal with it”
                        -- Jeff Dean
“Remember everything fails”
                  -- Randy Shoup
fun with numbers
datacenter availability
Source: Uptime Institute
Tier	
  I:	
  28.8	
  hours	
  annual	
  down4me	
  (99.67%	
  availability)
   Tier	
  II:	
  22.0	
  hrs	
  annual	
  down4me	
  (99.75%	
  availability)
   Tier	
  III:	
  1.6	
  hrs	
  annual	
  down4me	
  (99.98%	
  availability)
   Tier	
  IV:	
  0.8	
  hrs	
  annual	
  down4me	
  (99.99%	
  availability)




Source: Uptime Institute
cooling systems go down
power units fail
2-4% of servers
                                will die annually



Source: Jeff Dean, LADIS 2009
1-5% of disk drives
                                 will die every year



Source: Jeff Dean, LADIS 2009
2.3% AFR in population of 13,250
                         3.3% AFR in population of 22,400
                         4.2% AFR in population of 246,000




Source: James Hamilton
software breaks
human errors
human errors
             ~20% admin issues have unintended consequences




Source: James Hamilton
achieving scalability
  and availability
partitioning
redundancy
recovery oriented computing



Source: http://perspectives.mvdirona.com/, http://roc.cs.berkeley.edu/
assume sw/hw failure
design apps to be resilient
automation
Your Custom Applications and Services

                                                                                                 Isolated Networks
         Monitoring                    Management                         Tools
                                                                                                Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console         AWS Toolkit for Eclipse
                                                                                                       Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
Amazon S3
durable
available
!"#$%&'()*+


T                 T




     T
Amazon EC2
highly scalable
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                                                                    344'+567/'(.'
                                                                    8%%9%.:/'




            344'JJ'



                           I%:.%/:1='    ;<"&/:1='     A&B:1='     C10"&:1='    C".:1='      E(.:1='      ;"%/:1='
                           >?,,?,44@'   >?,3?,44@'   >?,>?,44@'   >?,H?,44@'   >?,D?,44@'   >?,F?,44@'   >?,G?,44@'
highly available systems
dynamic
fault tolerant
US East Region



Availability     Availability
 Zone A           Zone B



Availability     Availability
 Zone C           Zone D
2. one size does not fit all
data
2. one size does not fit all
      ^
many data types
structured data
using the right data store
(a) feature first
RDBMS



Oracle, SQL Server, DB2, MySQL, Postgres
Source: http://www.bioinformaticszen.com/
Source: http://www.bioinformaticszen.com/
Source: http://www.bioinformaticszen.com/
use a bigger computer
remove joins
scaling limits
(b) scale first
scale is highest priority
single RDBMS incapable
solution 1: data sharding
10’s
100’s
solution 2: scalable key-
       value store
scale is design point


MongoDB, Project Voldermort, Cassandra, HBase,
    BigTable, Amazon SimpleDB, Dynamo
(c) simple structured storage
simple
               fast
          low ops cost

BerkeleyDB, Tokyo Cabinet, Amazon SimpleDB
(d) purpose optimized stores
data warehousing
      stream processing


Aster Data,Vertica, Netezza, Greenplum,VoltDB,
                  StreamBase
what about files?
cluster file systems



     Lustre, GlusterFS
distributed file systems



         HDFS, GFS
distributed object store



      Amazon S3, Dynomite
Your Custom Applications and Services

                                                                                                 Isolated Networks
         Monitoring                    Management                         Tools
                                                                                                Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console         AWS Toolkit for Eclipse
                                                                                                       Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
Your Custom Applications and Services

                                                                                                 Isolated Networks
         Monitoring                    Management                         Tools
                                                                                                Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console         AWS Toolkit for Eclipse
                                                                                                       Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
3. processing big data
disk read/writes
slow & expensive
data processing
 fast & cheap
distribute the data
   parallel reads
data processing for the cloud
distributed file system
        (HDFS)
map/reduce
Via Cloudera under a Creative Commons License
Via Cloudera under a Creative Commons License
fault tolerance
massive scalability
petabyte scale
hosted hadoop service
hadoop easy and simple
Amazon Elastic
                                    MapReduce

                                     Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                    Hadoop                Hadoop     Hadoop
                         Elastic                                                         Elastic
                       MapReduce                                                       MapReduce
                                    Hadoop                Hadoop     Hadoop                        Notify
Web Console, Command
      line tools                    Input                                    output
                                   dataset                                   results



                                        Input	
  S3	
              Output	
  S3	
                   Get Results
   Input Data
                                         bucket                     bucket



                                      Amazon S3
back to the science
basic informatics workflow
Via Christolakis under a CC-BY-NC-ND license
Via Argonne National Labs under a CC-BY-SA license
killer app




Via Argonne National Labs under a CC-BY-SA license
getting the data
Register projects



                                                   Register samples



                                                      Sample prep



                                                       Sequencing




                                                          Analysis


These slides cover work presented by Matt Wood at various conferences
Image: Matt Wood
constant change
flexible data capture
virtual fields
no schema
specify at run time
specify at run time
 (bootstrapping)
Sample



                       Name

                      Organism

                    Concentration




Source: Matt Wood
Source: Matt Wood
key value pairs
change happens
V1                V2


                       Sample          Sample



                       Name             Name

                      Organism        Organism

                    Concentration   Concentration

                                       Origin

                                    Quality metric



Source: Matt Wood
Source: Matt Wood
high throughput
lots of pipelines
scaling projects/pipelines?
lots of apps
loosely coupled
automation
scale operationally
be agile
now what?
Via asklar under a CC-BY license
Via Argonne National Labs under a CC-BY-SA license
many data types
changing data types
Shaq Image: Keith Allison under a CC-BY-SA license
Shaq Image: Keith Allison under a CC-BY-SA license
Shaq Image: Keith Allison under a CC-BY-SA license
Shaq Image: Keith Allison under a CC-BY-SA license
Shaq Image: Keith Allison under a CC-BY-SA license
?
lots and lots and lots and lots
 and lots and lots of data and
  lots and lots of lots of data
By bitterlysweet under a CC-BY-NC-ND license
Source: http://bit.ly/anderson-bigdata
Chris Anderson doesn’t
  understand science
“more is different”
few data points
elaborate models
the unreasonable
                               effectiveness of data



Source: “The Unreasonable Effectiveness of Data”, Alon Halevy, Peter Norvig, and Fernando Pereira
simple models
  lots of data
information platform
information platforms at scale
one organization
4 TB daily added
 (compressed)
135 TB data scanned daily
     (compressed)
15 PB data total capacity
???
Facebook data from Ashish Thusoo’s HadoopWorld 2009 talk
not always that big
can we learn any lessons?



Source: “Information Platforms and the Rise of the Data Scientist”, Jeff Hammerbacher in Beautiful Data
analytics platform
Data warehouse
Data warehouse is a repository of an
organization's electronically stored data.
Data warehouses are designed to
facilitate reporting and analysis
ETL
extract
transform
load
Via asklar under a CC-BY license
1 TB
MySQL --> Oracle
more data
more data types
changing data types
limit data warehouse
too limited
how do you scale and adapt?
100’s of TBs
1000’s of jobs
back to the science
back in the day
small data sets
flat files
../
  ../folder1/ ../folder2/ . . . ../folderN/
                     file1
                     file2
                       .
                       .
                     fileN
shared file system
RDBMS
Image: Wikimedia Commons
Image: Chris Dagdigian
need to process
need to analyze
100’s of TBs
1000’s of jobs
Facebook data from Ashish Thusoo’s HadoopWorld 2009 talk
ETL
Via asklar under a CC-BY license
data mining
     &
 analytics
Via Argonne National Labs under a CC-BY-SA license
analysts are not
 programmers
not savvy with map/reduce
apache hive



 http://hadoop.apache.org/hive/
manage & query data
manage & query data
 on top of Hadoop
work by @peteskomoroch
cascading



http://www.cascading.org/
apache pig



http://hadoop.apache.org/pig/
Amazon Elastic
                                    MapReduce

                                     Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                    Hadoop                Hadoop     Hadoop
                         Elastic                                                         Elastic
                       MapReduce                                                       MapReduce
                                    Hadoop                Hadoop     Hadoop                        Notify
Web Console, Command
      line tools                    Input                                    output
                                   dataset                                   results



                                        Input	
  S3	
              Output	
  S3	
                   Get Results
   Input Data
                                         bucket                     bucket



                                      Amazon S3
hadoop and bioinformatics
High Throughput Sequence Analysis
Mike Schatz, University of Maryland
Short Read Mapping
Seed & Extend
Good alignments must have significant
exact alignment

Minimal exact alignment length = l/(k+1)
Seed & Extend
Good alignments must have significant
exact alignment

Minimal exact alignment length = l/(k+1)



          Expensive to scale
Seed & Extend
Good alignments must have significant
exact alignment

Minimal exact alignment length = l/(k+1)



          Expensive to scale
Seed & Extend
Good alignments must have significant
exact alignment

Minimal exact alignment length = l/(k+1)



          Expensive to scale

  Need parallelization framework
CloudBurst




Catalog k-mers     Collect seeds   End-to-end alignment
http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369
Bowtie: Ultrafast short read
            aligner


Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human
genome. Genome Biol 10 (3): R25.
SOAPSnp: Consensus
      alignment and SNP calling


Ruiqiang Li,Yingrui Li, Xiaodong Fang, et al. (2009) "SNP detection for massively parallel whole-genome resequencing" Genome
Res
Crossbow: Rapid whole
 genome SNP analysis

                                                           Ben Langmead




  http://bowtie-bio.sourceforge.net/crossbow/index.shtml
Preprocessed reads
Preprocessed reads



   Map: Bowtie
Preprocessed reads



     Map: Bowtie



Sort: Bin and partition
Preprocessed reads



     Map: Bowtie



Sort: Bin and partition


  Reduce: SoapSNP
Crossbow	
   condenses	
   over	
   1,000	
   hours	
   of	
  
resequencing	
   computa:on	
   into	
   a	
   few	
   hours	
  
without	
   requiring	
   the	
   user	
   to	
   own	
   or	
   operate	
   a	
  
computer	
  cluster
Comparing Genomes
Estimating relative evolutionary rates
           from sequence comparisons:
                Identification of probable orthologs
                              Admissible comparisons:       A or B vs. D
                                                            C vs. E
                              Inadmissible comparisons:    A or B vs. E
                                                           C vs. D




 A B C                      D    E                         species tree
                                                          gene tree
S. cerevisiae               C. elegans
Estimating relative evolutionary rates
           from sequence comparisons:
                          1. Orthologs found using the Reciprocal
                          smallest distance algorithm
                          2. Build alignment between two orthologs
                          >Sequence C
                          MSGRTILASTIAKPFQEEVTKAVKQLNFT-----PKLVGLLSNEDPAAKMYANWTGKTCESLGFKYEL-…
                          >Sequence E
                          MSGRTILASKVAETFNTEIINNVEEYKKTHNGQGPLLVGFLANNDPAAKMYATWTQKTSESMGFRYDL…




                          3. Estimate distance given a substitution
                          matrix
                                                        Phe Ala Pro Leu Thr
                                                      Phe
                                                      Ala µπ
                                                      Pro µπ µπ µπ
                                                      Leu µπ µπ µπ µπ




 A B C              D    E                                                     species tree
                                                                              gene tree
S. cerevisiae       C. elegans
RSD algorithm summary
 Genome I                                            Genome J


                          Ib                                     Jc

  Align sequences &
  Calculate distances          L     Orthologs:
                                                     Align sequences &
                                                     Calculate distances          H
                                   ib - jc D = 0.1
    c
        vs.       D=1.2                                    vs.            D=0.2
              a                                        b              a
        vs.       D=0.1                                    vs.            D=0.3
    c         b                                        b              b
        vs.       D=0.9                                    vs.            D=0.1
    c         c                                        b              c
Prof. Dennis Wall
Harvard Medical School
Roundup is a database of orthologs
and their evolutionary distances.
To get started, click browse. Alternatively, you can
read our documentation here.
Good luck, researchers!
massive computational
      demand
1000 genomes = 5,994,000
 processes = 23,976,000
          hours
2737 years
compared 50+ genomes
trends in data sharing
data motion is hard
cloud services are a viable
        dataspace
share data
share applications
share results
http://aws.amazon.com/publicdatasets/
Data Platform




App Platform
Data Platform




App Platform
Scalable Data Platform


                Services


                  APIs


Getters         Filters            Savers




            WORK
to conclude
big data
change thinking
data management
 data processing
   data sharing
think distributed
new software architectures
new computing paradigms
cloud services
the cloud works
Thank	
  you!




deesingh@amazon.com	
  Twi2er:@mndoci	
  
     Presenta4on	
  ideas	
  from	
  @mza,	
  James	
  Hamilton,	
  and	
  @lessig

More Related Content

What's hot

Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaBest Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaAmazon Web Services
 
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.com
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.comCloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.com
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.comAmazon Web Services
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSGetting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSAmazon Web Services
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Jatinder Randhawa
 
2009.11.20 BPstudy#27 Amazon Web Service
2009.11.20 BPstudy#27 Amazon Web Service2009.11.20 BPstudy#27 Amazon Web Service
2009.11.20 BPstudy#27 Amazon Web ServiceHiro Fukami
 
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012Amazon Web Services
 
CloudStack-Development-Story
CloudStack-Development-StoryCloudStack-Development-Story
CloudStack-Development-StoryKimihiko Kitase
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWSAmazon Web Services
 
Survey of International and Thai Cloud Providers and Cloud Software Projects
Survey of International and Thai Cloud Providers and Cloud Software ProjectsSurvey of International and Thai Cloud Providers and Cloud Software Projects
Survey of International and Thai Cloud Providers and Cloud Software Projectst b
 
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...Amazon Web Services
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesAmazon Web Services
 
Bootstrapping Session 4 - Building Web Scale Applications
Bootstrapping Session 4 - Building Web Scale ApplicationsBootstrapping Session 4 - Building Web Scale Applications
Bootstrapping Session 4 - Building Web Scale ApplicationsAmazon Web Services
 
Programming Amazon Web Services for Beginners (1)
Programming Amazon Web Services for Beginners (1)Programming Amazon Web Services for Beginners (1)
Programming Amazon Web Services for Beginners (1)Markus Klems
 
Aws for Start-ups - Introduction & AWS Overview
Aws for Start-ups  - Introduction & AWS OverviewAws for Start-ups  - Introduction & AWS Overview
Aws for Start-ups - Introduction & AWS OverviewAmazon Web Services
 

What's hot (20)

Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaBest Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
 
Keynote - Werner Vogels
Keynote - Werner Vogels Keynote - Werner Vogels
Keynote - Werner Vogels
 
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.com
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.comCloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.com
Cloud Computing for the Enterprise, Dr Werner Vogels, CTO Amazon.com
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSGetting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)
 
2009.11.20 BPstudy#27 Amazon Web Service
2009.11.20 BPstudy#27 Amazon Web Service2009.11.20 BPstudy#27 Amazon Web Service
2009.11.20 BPstudy#27 Amazon Web Service
 
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
 
CloudStack-Development-Story
CloudStack-Development-StoryCloudStack-Development-Story
CloudStack-Development-Story
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWS
 
Survey of International and Thai Cloud Providers and Cloud Software Projects
Survey of International and Thai Cloud Providers and Cloud Software ProjectsSurvey of International and Thai Cloud Providers and Cloud Software Projects
Survey of International and Thai Cloud Providers and Cloud Software Projects
 
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...2011 State of the Cloud:  A Year's Worth of Innovation in 30 Minutes - Jinesh...
2011 State of the Cloud: A Year's Worth of Innovation in 30 Minutes - Jinesh...
 
AWS 101 Event - 16 July 2013
AWS 101 Event - 16 July 2013AWS 101 Event - 16 July 2013
AWS 101 Event - 16 July 2013
 
Keynote from Werner Vogels
Keynote from Werner VogelsKeynote from Werner Vogels
Keynote from Werner Vogels
 
Masterclass Webinar: Amazon S3
Masterclass Webinar: Amazon S3Masterclass Webinar: Amazon S3
Masterclass Webinar: Amazon S3
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application Services
 
Bootstrapping Session 4 - Building Web Scale Applications
Bootstrapping Session 4 - Building Web Scale ApplicationsBootstrapping Session 4 - Building Web Scale Applications
Bootstrapping Session 4 - Building Web Scale Applications
 
AWS 101 Event London - Feb 2014
AWS 101 Event London - Feb 2014AWS 101 Event London - Feb 2014
AWS 101 Event London - Feb 2014
 
Programming Amazon Web Services for Beginners (1)
Programming Amazon Web Services for Beginners (1)Programming Amazon Web Services for Beginners (1)
Programming Amazon Web Services for Beginners (1)
 
Aws for Start-ups - Introduction & AWS Overview
Aws for Start-ups  - Introduction & AWS OverviewAws for Start-ups  - Introduction & AWS Overview
Aws for Start-ups - Introduction & AWS Overview
 

Similar to Masterworks talk on Big Data and the implications of petascale science

Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSGetting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSAmazon Web Services
 
NHGRI Cloud Computing talk
NHGRI Cloud Computing talkNHGRI Cloud Computing talk
NHGRI Cloud Computing talkDeepak Singh
 
13h00 aws 2012-fault_tolerant_applications
13h00   aws 2012-fault_tolerant_applications13h00   aws 2012-fault_tolerant_applications
13h00 aws 2012-fault_tolerant_applicationsinfolive
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Amazon Web Services
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
 
AWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 OverviewAWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 OverviewAmazon Web Services
 
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Amazon Web Services
 
Andy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation DayAndy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation DayAmazon Web Services
 
AWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner VogelsAWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner VogelsAmazon Web Services
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarAmazon Web Services
 
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazonKeynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazonAmazon Web Services LATAM
 
Unlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use CasesUnlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use CasesAmazon Web Services
 
The Cloud as a Platform
The Cloud as a PlatformThe Cloud as a Platform
The Cloud as a Platformjineshvaria
 
AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?Amazon Web Services
 
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...Amazon Web Services
 

Similar to Masterworks talk on Big Data and the implications of petascale science (20)

Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWSGetting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
Getting Started in the AWS Cloud, Glen Robinson, Solutions Architect, AWS
 
NHGRI Cloud Computing talk
NHGRI Cloud Computing talkNHGRI Cloud Computing talk
NHGRI Cloud Computing talk
 
13h00 aws 2012-fault_tolerant_applications
13h00   aws 2012-fault_tolerant_applications13h00   aws 2012-fault_tolerant_applications
13h00 aws 2012-fault_tolerant_applications
 
Fault Tolerant Applications on AWS
Fault Tolerant Applications on AWSFault Tolerant Applications on AWS
Fault Tolerant Applications on AWS
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
AWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 OverviewAWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 Overview
 
Overview of Amazon Web Services
Overview of Amazon Web ServicesOverview of Amazon Web Services
Overview of Amazon Web Services
 
AWS Services Overview - Ryland
AWS Services Overview - RylandAWS Services Overview - Ryland
AWS Services Overview - Ryland
 
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
 
Andy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation DayAndy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation Day
 
AWS GovCloud (US)
AWS GovCloud (US)AWS GovCloud (US)
AWS GovCloud (US)
 
AWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner VogelsAWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner Vogels
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
 
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazonKeynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
 
Unlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use CasesUnlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use Cases
 
Werner Vogels
Werner Vogels Werner Vogels
Werner Vogels
 
The Cloud as a Platform
The Cloud as a PlatformThe Cloud as a Platform
The Cloud as a Platform
 
AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?
 
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...AWS Overview  - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
AWS Overview - Cloud for the Enterprise - AWS Enterprise Tour - SF - 2010, D...
 

More from Deepak Singh

Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Deepak Singh
 
Talk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsTalk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsDeepak Singh
 
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkPlatforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkDeepak Singh
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteDeepak Singh
 
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingTalk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingDeepak Singh
 
Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data scienceDeepak Singh
 
Discovery 2015 Workshop
Discovery 2015 WorkshopDiscovery 2015 Workshop
Discovery 2015 WorkshopDeepak Singh
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for BioinformaticsDeepak Singh
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Deepak Singh
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science ConnectedDeepak Singh
 
Bioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameBioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameDeepak Singh
 
Nanotechnology and medicine
Nanotechnology and medicineNanotechnology and medicine
Nanotechnology and medicineDeepak Singh
 
An Open Scientific Future
An Open Scientific FutureAn Open Scientific Future
An Open Scientific FutureDeepak Singh
 

More from Deepak Singh (17)

Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11
 
Talk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsTalk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource Directors
 
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkPlatforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
#arseniclife
#arseniclife#arseniclife
#arseniclife
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop Keynote
 
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingTalk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
 
Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data science
 
Discovery 2015 Workshop
Discovery 2015 WorkshopDiscovery 2015 Workshop
Discovery 2015 Workshop
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for Bioinformatics
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
Bioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameBioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frame
 
Searching Science
Searching ScienceSearching Science
Searching Science
 
Nanotechnology and medicine
Nanotechnology and medicineNanotechnology and medicine
Nanotechnology and medicine
 
An Open Scientific Future
An Open Scientific FutureAn Open Scientific Future
An Open Scientific Future
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Masterworks talk on Big Data and the implications of petascale science