SlideShare a Scribd company logo
1 of 121
Data without Limits
      Dr. Werner Vogels
      CTO, Amazon.com
Human Genome Project


Collaborative project to sequence every single letter
of the human genetic code.


13 years and $billions to complete.

Gigabyte scale datasets (transferred between sites on
iPods!)
Beyond the Human Genome

45+ species sequenced: mouse, rat, gorilla, rabbit,
platypus, nematode, zebra fish...

Compare genomes between species to identify
biologically interesting areas of the genome.

100Gb scale datasets. Increased computational
requirements.
The Next Generation

New sequencing instruments lead to a dramatic
drop in cost and time required to sequence a genome.

Sequence and compare genetic code of individuals to
find areas of variation. Much more interesting.

Terabyte scale datasets. Significant computational
requirements.
The 1000 Genomes Projects

Public/private consortium to build world’s largest
collection of human genetic variation.

Hugely important dataset to drive new insight into
known genetic traits, and the identification of new ones.

Vast, complex data and computational resources required,
beyond reach of most research groups and hospitals.
1000 Genomes in the Cloud


The 1000 Genomes data made available to all on AWS.


Stored for free as part of the Public Datasets program.
Updated regularly.

200Tb. 1700 individual genomes. As much compute and
storage as required available to all.
The Cloud
Helps do the science we are capable of
50,000 core
CycleCloud Super Computer
running on the Amazon Cloud
How big is 50,000 cores?
Why does it matter?
(W.H.O./Globocan 2008)
Every day is crucial and costly
Find matches in millions of keys
Challenge: To run a virtual screen with a higher
 accuracy algorithm & 21 million compounds
Metric        Count
   Compute Hours of         109,927 hours
             Work
   Compute Days of          4,580 days
             Work
             Using CycleCloud & Amazon Cloud
   Compute Years of      12.55 years
           TheWork
              impossible run finished in...
        Ligand Count        ~21 million ligands



Using CycleCloud & Amazon Cloud
 The impossible run finished in...
3 Hours
for $4828.85/hr
Instead of $20+
    Million in
 Infrastructure
Every day is crucial and costly
Big Data powered by AWS




BIG-DATA
  The collection and analysis of large
     amounts of data to create a
        competitive advantage
Big Data powered by AWS
                                 Big Data Verticals


                                                                                                           Social
 Media/Advertising   Oil & Gas        Retail        Life Sciences   Financial Services     Security
                                                                                                       Network/Gaming




                                                                                                           User
                                                                                         Anti-virus     Demographics
    Targeted                      Recommendations
                                                                      Monte Carlo
   Advertising                                                        Simulations



                     Seismic                         Genome                               Fraud            Usage
                     Analysis                        Analysis                            Detection        analysis


   Image and
                                  Transaction                            Risk
     Video                          Analysis                            Analysis           Image          In-game
   Processing
                                                                                         Recognition       metrics
Big Data powered by AWS




           Storage          Big Data               Compute




              Challenges start at relatively small volumes

    100 GB                                                   1,000 PB
Big Data powered by AWS




           Storage        Big Data   Compute




When data sets and data analytics need to scale to the
point that you have to start innovating around how to
     collect, store, organize, analyze and share it
Big Data powered by AWS
Big Data powered by AWS




           Storage     Innovation    Compute




 DynamoDB            Glacier   HPC           EMR
             S3                       Spot
Storage       Big Data                Compute
           Unconstrained data growth



                                         95% of the 1.2 zettabytes of
                               ZB        data in the digital universe is
                                         unstructured
                                         70% of of this is user-
                                         generated content
                           EB            Unstructured data growth
                                         explosive, with estimates of
                                         compound annual growth
                                         (CAGR) at 62% from 2008 –
                  PB                     2012.
                                                               Source: IDC
      TB
GB
Storage          Big Data            Compute
                            Why now?




Web sites                                                  Sensor data
Blogs/Reviews/Emails/Pictures                 Weather, water, smart grids
Social Graphs                                           Images/videos
Facebook, Linked-in, Contacts                    Traffic, security cameras
Application server logs                                          Twitter
Web sites, games                       50m tweets/day 1,400% growth per
                                                                   year
Storage          Big Data            Compute
                            Why now?




Web sites                                                  Sensor data
Blogs/Reviews/Emails/Pictures                 Weather, water, smart grids


Mobile connected world
Social Graphs
Facebook, Linked-in, Contacts
                                                        Images/videos
                                                 Traffic, security cameras
Application server logs using, easier to collect) Twitter
          (more people
Web sites, games                       50m tweets/day 1,400% growth per
                                                                   year
Storage          Big Data            Compute
                            Why now?




Web sites                                                  Sensor data
Blogs/Reviews/Emails/Pictures                 Weather, water, smart grids


  More aspects of data
Social Graphs
Facebook, Linked-in, Contacts
                                                        Images/videos
                                                 Traffic, security cameras
Application server logs
          (variety, depth, location, frequency)                  Twitter
Web sites, games                       50m tweets/day 1,400% growth per
                                                                   year
Storage          Big Data            Compute
                            Why now?




Web sites                                                  Sensor data
Blogs/Reviews/Emails/Pictures                 Weather, water, smart grids


Possible to understand
Social Graphs
Facebook, Linked-in, Contacts
                                                        Images/videos
                                                 Traffic, security cameras
Application server logs
           (not just answer specific questions)                  Twitter
Web sites, games                       50m tweets/day 1,400% growth per
                                                                   year
Storage    Big Data      Compute
                 Why now?




Who is your consumer really?
What do people really like?
What is happening socially with your products?
How do people really use your product?
Storage          Big Data            Compute
                             Why now?




 Web sites                                                  Sensor data
 Blogs/Reviews/Emails/Pictures                 Weather, water, smart grids
 Social Graphs                                           Images/videos
More server logs => better results
             data
 Facebook, Linked-in, Contacts
 Application
                                                  Traffic, security cameras
                                                                  Twitter
 Web sites, games                       50m tweets/day 1,400% growth per
                                                                    year
BIGGER IS BETTER
UNCERTAINTY
Big Data requires
NO LIMITS
Storage    Big Data            Compute
          From one instance…
Storage   Big Data        Compute
          …to thousands
Storage   Big Data          Compute
          and back again…
Big Data Pipeline
Collect | Store | Organize | Analyze | Share
Storage                Big Data                       Compute
                              Where do you put your slice of it?


                               Collection - Ingestion




   AWS Direct Connect          AWS Import/Export                   Queuing               Amazon Storage Gateway
Dedicated bandwidth between   Physical transfer of media   Reliable messaging for task   Shrink-wrapped gateway for
     you site and AWS           into and out of AWS         distribution & collection      volume synchronization
Storage           Big Data                     Compute
                              Where do you put your slice of it?




Relational Database Service                   DynamoDB                  Simple Storage Service (S3)
   Fully managed database                  NoSQL, Schemaless,            Object datastore up to 5TB per
   (MySQL, Oracle, MSSQL)                 Provisioned throughput                    object
                                                database                   99.999999999% durability
Storage       Big Data                      Compute
      Where do you put your slice of it?




                         Glacier
                  Long term cold storage
                 From $0.01 per GB/Month
                 99.999999999% durability
Storage               Big Data                         Compute
                            Glacier - Full lifecycle big data management




       Data import                          Computation &                       Long term archive
                                             Visualization

                                                                                Once data analysis complete,
 Physical shipping of devices for      HPC & EMR cluster jobs of many
                                                                              entire resultant dataset placed in
    creation of data in AWS                   thousands of cores
                                                                                cold storage rather than tape


                                                                              e.g. Cost effective when compared
e.g. 50TB of Seismic data created       e.g. 200TB of visualization data
                                                                               to tape, retrieval in 3-5 hours if
 as EBS volumes in a Gluster file      generated from cluster processing
                                                                                           required
             system
Storage                Big Data                     Compute
                                How quick do you need to read it?



        Single digit ms                      10s-100s ms                      <5 hours
          DynamoDB                                 S3                           Glacier
    Social scale applications             Any object, any app           Media & asset archives
Provisioned throughput performance      99.999999999% durability           Extremely low cost
     Flexible consistency models         Objects up to 5TB in size        S3 levels of durability




                                        Performance




                                 Scale                          Price
Storage     Big Data               Compute
          Operate at any scale




     Unlimited data

            Performance




      Scale                      Price
Storage                  Big Data               Compute
                   Pay for only what you use


       Provisioned IOPS                        Volume used
Provisioned read/write performance       Pay for volume stored per
   per Dynamo table/EBS volume              month & puts/gets
Pay for a given provisioned capacity   No capacity planning required
        whether used or not            to maintain unlimited storage




                         Performance




               Scale                      Price
Storage          Big Data              Compute
“Big data” change the dynamics of computation and data sharing



   Collection           Computation             Collaboration
How do I acquire it?    What horsepower       How do I work with
 Where do I put it?     can I apply to it?      others on it?
Storage          Big Data              Compute
“Big data” change the dynamics of computation and data sharing



   Collection           Computation             Collaboration
How do I acquire it?    What horsepower       How do I work with
 Where do I put it?     can I apply to it?      others on it?


   Direct Connect                EC2              Cloud Formation
   Import/Export                GPUs              Simple Workflow
         S3             Elastic Map Reduce               S3
    DynamoDB
Amazon Elastic MapReduce
Storage             Big Data              Compute
                            Hadoop-as-a-Service – Elastic MapReduce



Elastic MapReduce
Managed, elastic Hadoop cluster
Integrates with S3 & DynamoDB
Leverage Hive & Pig analytics scripts
Integrates with instance types such
as spot
Elastic MapReduce
Managed, elastic Hadoop cluster
Integrates with S3 & DynamoDB
Leverage Hive & Pig analytics scripts
Integrates with instance types such as spot




                              Feature         Details
                            Scalable          Use as many or as few compute instances running
                                              Hadoop as you want. Modify the number of
                                              instances while your job flow is running
          Integrated with other               Works seamlessly with S3 as origin and output.
                       services               Integrates with DynamoDB
                  Comprehensive               Supports languages such as Hive and Pig for defining
                                              analytics, and allows complex definitions in
                                              Cascading, Java, Ruby, Perl, Python, PHP, R, or C++
                     Cost effective           Works with Spot instance types
                        Monitoring            Monitor job flows from with the management
                                              console
But what is it?
A framework
Splits data into pieces
Lets processing occur
   Gathers the results
S3 + DynamoDB

Input data
S3 + DynamoDB

        Input data




Code   Elastic
       MapReduce
S3 + DynamoDB

        Input data




Code   Elastic       Name
       MapReduce     node
S3 + DynamoDB

        Input data




Code   Elastic       Name
       MapReduce     node




                               Elastic
                               cluster
S3 + DynamoDB

        Input data




Code   Elastic       Name
       MapReduce     node



                                         HDFS


                               Elastic
                               cluster
S3 + DynamoDB

        Input data




Code   Elastic                 Name
       MapReduce               node


                         Queries
                                                        HDFS
                         + BI
                     Via JDBC, Pig, Hive
                                              Elastic
                                              cluster
S3 + DynamoDB

        Input data




Code   Elastic                 Name                              Output
       MapReduce               node                            S3 + DynamoDB



                         Queries
                                                        HDFS
                         + BI
                     Via JDBC, Pig, Hive
                                              Elastic
                                              cluster
S3 + DynamoDB

Input data




                               Output
                             S3 + DynamoDB
Very large
 click log
 (e.g TBs)
Lots of actions
             by John Smith




Very large
 click log
 (e.g TBs)
Lots of actions
             by John Smith




Very large
 click log
 (e.g TBs)    Split the
               log into
             many small
                pieces
Process in an
                               EMR cluster
             Lots of actions
             by John Smith




Very large
 click log
 (e.g TBs)    Split the
               log into
             many small
                pieces
Process in an
                               EMR cluster
             Lots of actions
             by John Smith




Very large
 click log
 (e.g TBs)    Split the            Aggregate
               log into            the results
             many small              from all
                pieces              the nodes
Process in an
                               EMR cluster
             Lots of actions
             by John Smith




                                                 What
Very large                                       John
 click log
 (e.g TBs)                                       Smith
              Split the            Aggregate
               log into            the results    did
             many small              from all
                pieces              the nodes
What
Very large                                       John
 click log
 (e.g TBs)                                       Smith
             Insight in a fraction of the time
                                                  did
1 instance for 100 hours
            =
100 instances for 1 hour
Small instance = $8
Operated 2 million+ Hadoop clusters last year
Features powered by Amazon Elastic
           MapReduce:
     People Who Viewed this Also Viewed
             Review highlights
     Auto complete as you type on search
         Search spelling suggestions
                Top searches
                     Ads

200 Elastic MapReduce jobs per day
       Processing 3TB of data
Features driven by MapReduce
Storage        Big Data              Compute
   Hadoop-as-a-Service – Elastic MapReduce



                   "With Amazon Elastic MapReduce, there
                  was no upfront investment in hardware, no
                  hardware procurement delay, and no need
                          to hire additional operations staff.

                              Because of the flexibility of the
                    platform, our first new online advertising
                  campaign experienced a 500% increase in
                           return on ad spend from a similar
                                     campaign a year before.”
Data Analytics

3.5 billion records             Execute batch processing data sets
                                ranging in size from dozens of
                                                                                 “Our first client
71 million unique cookies       Gigabytes to Terabytes                   campaign experienced
1.7 million targeted ads                                                     a 500% increase in
                                Building in-house infrastructure to
required per day                analyze these click stream datasets
                                                                              their return on ad
                                requires investment in expensive           spend from a similar
                                “headroom” to handle peak demand.               campaign a year
                                                                                         before”




                            User recently
                            purchased a
                            sports movie           Targeted Ad
                            and is searching
                            for video games      (1.7 Million per day)
“AWS gave us the flexibility to bring a massive
                               amount of capacity online in a short period of
                             time and allowed us to do so in an operationally
     DynamoDB:                                            straightforward way.
 over 500,000 writes per
         second
                             AWS is now Shazam’s cloud provider of choice,”
   Amazon EMR:                                                      Jason Titus,
more than 1 million writes
                                                                           CTO
       per second
Step 1: Tracking                                     Step 2: Panel                                   Step 3: Dashboard

We’ve created a unique tracking application. It      We invite members of a research panel           Usage data now begins to pour into the
keeps track of all website visited, software used,   to install it. We know not only their digital   Wakoopa dashboard in real-time. Log in,
and/or ads seen.                                     habits, but also their offline                  and create beautiful visualizations and
                                                     demographics and behavior.                      useful reports.
Technology

Panel

                         AWS



        Activity   SQS   EMR   RDS   Data
                                                 Kamek*




                                                   Metrics
                         S3




                                            Wakoopa dashboard
Rediff uses Amazon EMR along with Amazon S3 to
perform data mining, log processing and analytics for
their online business. Inputs gained are used to power
a better user experience on their portal.

Rediff needed 12-15 hours to run this on a 10-12 node
cluster on premise. AWS gave choice and flexibility of
an on demand model which can be scaled up and
down and shortened the time required to process data.
More than 25 Million Streaming Members



50 Billion Events Per Day
S3




~1 PB of data stored in Amazon S3
Users Overtime
Leader in 2011 Gartner IaaS
     Magic Quadrant
Cloud enables big data
      collection
Cloud enables big data
     processing
Cloud enables big data
    collaboration
aws.amazon.com
 get started with the free tier

More Related Content

Viewers also liked

AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...
AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...
AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...Amazon Web Services
 
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa Carlson
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa CarlsonAWS Summit 2013 | Singapore - Public Sector Keynote, Teresa Carlson
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa CarlsonAmazon Web Services
 
Monetise your content with Amazon CloudFront
Monetise your content with Amazon CloudFrontMonetise your content with Amazon CloudFront
Monetise your content with Amazon CloudFrontAmazon Web Services
 
Empowering Publishers Event - Intro - May-15-2013
Empowering Publishers Event - Intro - May-15-2013Empowering Publishers Event - Intro - May-15-2013
Empowering Publishers Event - Intro - May-15-2013Amazon Web Services
 
Advanced Topics - Session 2 - Introducing AWS OpsWorks
Advanced Topics - Session 2 - Introducing AWS OpsWorksAdvanced Topics - Session 2 - Introducing AWS OpsWorks
Advanced Topics - Session 2 - Introducing AWS OpsWorksAmazon Web Services
 
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPC
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPCAWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPC
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPCAmazon Web Services
 
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPC
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPCAWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPC
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPCAmazon Web Services
 
AWS Enterprise Summit London 2013 - Stuart Lynn - Sage
AWS Enterprise Summit London 2013 - Stuart Lynn - SageAWS Enterprise Summit London 2013 - Stuart Lynn - Sage
AWS Enterprise Summit London 2013 - Stuart Lynn - SageAmazon Web Services
 
AWS Summit 2013 | Singapore - Understanding AWS Storage Options
AWS Summit 2013 | Singapore - Understanding AWS Storage OptionsAWS Summit 2013 | Singapore - Understanding AWS Storage Options
AWS Summit 2013 | Singapore - Understanding AWS Storage OptionsAmazon Web Services
 
AWS 101 Lunch & Learn March 2013
AWS 101 Lunch & Learn March 2013AWS 101 Lunch & Learn March 2013
AWS 101 Lunch & Learn March 2013Amazon Web Services
 
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPC
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPCAWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPC
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPCAmazon Web Services
 
Viaggio attraverso il cloud come costruire architetture web scalabili e rob...
Viaggio attraverso il cloud   come costruire architetture web scalabili e rob...Viaggio attraverso il cloud   come costruire architetture web scalabili e rob...
Viaggio attraverso il cloud come costruire architetture web scalabili e rob...Amazon Web Services
 
Focus on your app with Amazon RDS
Focus on your app with Amazon RDSFocus on your app with Amazon RDS
Focus on your app with Amazon RDSAmazon Web Services
 
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012Amazon Web Services
 
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...Amazon Web Services
 
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Amazon Web Services
 
Best Practices for Getting Started with AWS
Best Practices for Getting Started with AWSBest Practices for Getting Started with AWS
Best Practices for Getting Started with AWSAmazon Web Services
 
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas Karandikar
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas KarandikarAWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas Karandikar
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas KarandikarAmazon Web Services
 
Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Amazon Web Services
 

Viewers also liked (20)

AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...
AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...
AWS Sydney Summit 2013 - Continuous Deployment Practices, with Production, Te...
 
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa Carlson
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa CarlsonAWS Summit 2013 | Singapore - Public Sector Keynote, Teresa Carlson
AWS Summit 2013 | Singapore - Public Sector Keynote, Teresa Carlson
 
Monetise your content with Amazon CloudFront
Monetise your content with Amazon CloudFrontMonetise your content with Amazon CloudFront
Monetise your content with Amazon CloudFront
 
Empowering Publishers Event - Intro - May-15-2013
Empowering Publishers Event - Intro - May-15-2013Empowering Publishers Event - Intro - May-15-2013
Empowering Publishers Event - Intro - May-15-2013
 
Advanced Topics - Session 2 - Introducing AWS OpsWorks
Advanced Topics - Session 2 - Introducing AWS OpsWorksAdvanced Topics - Session 2 - Introducing AWS OpsWorks
Advanced Topics - Session 2 - Introducing AWS OpsWorks
 
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPC
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPCAWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPC
AWS Canberra WWPS Summit 2013 - Extending your Datacentre with Amazon VPC
 
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPC
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPCAWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPC
AWS Summit 2013 | Auckland - Extending your Datacentre with Amazon VPC
 
AWS Enterprise Summit London 2013 - Stuart Lynn - Sage
AWS Enterprise Summit London 2013 - Stuart Lynn - SageAWS Enterprise Summit London 2013 - Stuart Lynn - Sage
AWS Enterprise Summit London 2013 - Stuart Lynn - Sage
 
AWS Summit 2013 | Singapore - Understanding AWS Storage Options
AWS Summit 2013 | Singapore - Understanding AWS Storage OptionsAWS Summit 2013 | Singapore - Understanding AWS Storage Options
AWS Summit 2013 | Singapore - Understanding AWS Storage Options
 
AWS 101 Lunch & Learn March 2013
AWS 101 Lunch & Learn March 2013AWS 101 Lunch & Learn March 2013
AWS 101 Lunch & Learn March 2013
 
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPC
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPCAWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPC
AWS Summit 2013 | Singapore - Extending your Datacenter with Amazon VPC
 
Viaggio attraverso il cloud come costruire architetture web scalabili e rob...
Viaggio attraverso il cloud   come costruire architetture web scalabili e rob...Viaggio attraverso il cloud   come costruire architetture web scalabili e rob...
Viaggio attraverso il cloud come costruire architetture web scalabili e rob...
 
Your First Week with Amazon EC2
Your First Week with Amazon EC2Your First Week with Amazon EC2
Your First Week with Amazon EC2
 
Focus on your app with Amazon RDS
Focus on your app with Amazon RDSFocus on your app with Amazon RDS
Focus on your app with Amazon RDS
 
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
MED303 Addressing Security in Media Workflows - AWS re: Invent 2012
 
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
 
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
Cloud Storage Transformation – Keynote - AWS Cloud Storage for the Enterprise...
 
Best Practices for Getting Started with AWS
Best Practices for Getting Started with AWSBest Practices for Getting Started with AWS
Best Practices for Getting Started with AWS
 
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas Karandikar
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas KarandikarAWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas Karandikar
AWS Summit 2013 | India - How Start-Ups Benefit from AWS, Rajas Karandikar
 
Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2
 

Similar to End Note - AWS India Summit 2012

Finding Out More with Data Analytics and AWS
Finding Out More with Data Analytics and AWSFinding Out More with Data Analytics and AWS
Finding Out More with Data Analytics and AWSAmazon Web Services
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk OverviewSplunk
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
Umsl challanges for brand measuring social media -marshall sponder - apr...
Umsl    challanges for brand measuring social media  -marshall sponder  - apr...Umsl    challanges for brand measuring social media  -marshall sponder  - apr...
Umsl challanges for brand measuring social media -marshall sponder - apr...Marshall Sponder
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifectasinhabipul
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Our Social Times
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Future of technical innovation 3 trends that impact enterprise users
Future of technical innovation   3 trends that impact enterprise usersFuture of technical innovation   3 trends that impact enterprise users
Future of technical innovation 3 trends that impact enterprise usersJohn Gibbon
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalAndrew Gregoris
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentMicrosoft Technet France
 
Making sense of consumer data in the digital world
Making sense of consumer data in the digital worldMaking sense of consumer data in the digital world
Making sense of consumer data in the digital worldRachel Aldighieri
 

Similar to End Note - AWS India Summit 2012 (20)

Finding Out More with Data Analytics and AWS
Finding Out More with Data Analytics and AWSFinding Out More with Data Analytics and AWS
Finding Out More with Data Analytics and AWS
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk Overview
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
Umsl challanges for brand measuring social media -marshall sponder - apr...
Umsl    challanges for brand measuring social media  -marshall sponder  - apr...Umsl    challanges for brand measuring social media  -marshall sponder  - apr...
Umsl challanges for brand measuring social media -marshall sponder - apr...
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifecta
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Future of technical innovation 3 trends that impact enterprise users
Future of technical innovation   3 trends that impact enterprise usersFuture of technical innovation   3 trends that impact enterprise users
Future of technical innovation 3 trends that impact enterprise users
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity Final
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligent
 
Making sense of consumer data in the digital world
Making sense of consumer data in the digital worldMaking sense of consumer data in the digital world
Making sense of consumer data in the digital world
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

End Note - AWS India Summit 2012

  • 1. Data without Limits Dr. Werner Vogels CTO, Amazon.com
  • 2. Human Genome Project Collaborative project to sequence every single letter of the human genetic code. 13 years and $billions to complete. Gigabyte scale datasets (transferred between sites on iPods!)
  • 3. Beyond the Human Genome 45+ species sequenced: mouse, rat, gorilla, rabbit, platypus, nematode, zebra fish... Compare genomes between species to identify biologically interesting areas of the genome. 100Gb scale datasets. Increased computational requirements.
  • 4. The Next Generation New sequencing instruments lead to a dramatic drop in cost and time required to sequence a genome. Sequence and compare genetic code of individuals to find areas of variation. Much more interesting. Terabyte scale datasets. Significant computational requirements.
  • 5. The 1000 Genomes Projects Public/private consortium to build world’s largest collection of human genetic variation. Hugely important dataset to drive new insight into known genetic traits, and the identification of new ones. Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.
  • 6. 1000 Genomes in the Cloud The 1000 Genomes data made available to all on AWS. Stored for free as part of the Public Datasets program. Updated regularly. 200Tb. 1700 individual genomes. As much compute and storage as required available to all.
  • 7. The Cloud Helps do the science we are capable of
  • 8. 50,000 core CycleCloud Super Computer running on the Amazon Cloud
  • 9. How big is 50,000 cores? Why does it matter?
  • 11. Every day is crucial and costly
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Find matches in millions of keys
  • 17. Challenge: To run a virtual screen with a higher accuracy algorithm & 21 million compounds
  • 18.
  • 19. Metric Count Compute Hours of 109,927 hours Work Compute Days of 4,580 days Work Using CycleCloud & Amazon Cloud Compute Years of 12.55 years TheWork impossible run finished in... Ligand Count ~21 million ligands Using CycleCloud & Amazon Cloud The impossible run finished in...
  • 21. Instead of $20+ Million in Infrastructure
  • 22. Every day is crucial and costly
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Big Data powered by AWS BIG-DATA The collection and analysis of large amounts of data to create a competitive advantage
  • 33. Big Data powered by AWS Big Data Verticals Social Media/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Anti-virus Demographics Targeted Recommendations Monte Carlo Advertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video Analysis Analysis Image In-game Processing Recognition metrics
  • 34. Big Data powered by AWS Storage Big Data Compute Challenges start at relatively small volumes 100 GB 1,000 PB
  • 35. Big Data powered by AWS Storage Big Data Compute When data sets and data analytics need to scale to the point that you have to start innovating around how to collect, store, organize, analyze and share it
  • 37. Big Data powered by AWS Storage Innovation Compute DynamoDB Glacier HPC EMR S3 Spot
  • 38. Storage Big Data Compute Unconstrained data growth 95% of the 1.2 zettabytes of ZB data in the digital universe is unstructured 70% of of this is user- generated content EB Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – PB 2012. Source: IDC TB GB
  • 39. Storage Big Data Compute Why now? Web sites Sensor data Blogs/Reviews/Emails/Pictures Weather, water, smart grids Social Graphs Images/videos Facebook, Linked-in, Contacts Traffic, security cameras Application server logs Twitter Web sites, games 50m tweets/day 1,400% growth per year
  • 40. Storage Big Data Compute Why now? Web sites Sensor data Blogs/Reviews/Emails/Pictures Weather, water, smart grids Mobile connected world Social Graphs Facebook, Linked-in, Contacts Images/videos Traffic, security cameras Application server logs using, easier to collect) Twitter (more people Web sites, games 50m tweets/day 1,400% growth per year
  • 41. Storage Big Data Compute Why now? Web sites Sensor data Blogs/Reviews/Emails/Pictures Weather, water, smart grids More aspects of data Social Graphs Facebook, Linked-in, Contacts Images/videos Traffic, security cameras Application server logs (variety, depth, location, frequency) Twitter Web sites, games 50m tweets/day 1,400% growth per year
  • 42. Storage Big Data Compute Why now? Web sites Sensor data Blogs/Reviews/Emails/Pictures Weather, water, smart grids Possible to understand Social Graphs Facebook, Linked-in, Contacts Images/videos Traffic, security cameras Application server logs (not just answer specific questions) Twitter Web sites, games 50m tweets/day 1,400% growth per year
  • 43. Storage Big Data Compute Why now? Who is your consumer really? What do people really like? What is happening socially with your products? How do people really use your product?
  • 44. Storage Big Data Compute Why now? Web sites Sensor data Blogs/Reviews/Emails/Pictures Weather, water, smart grids Social Graphs Images/videos More server logs => better results data Facebook, Linked-in, Contacts Application Traffic, security cameras Twitter Web sites, games 50m tweets/day 1,400% growth per year
  • 46.
  • 49. Storage Big Data Compute From one instance…
  • 50. Storage Big Data Compute …to thousands
  • 51. Storage Big Data Compute and back again…
  • 52. Big Data Pipeline Collect | Store | Organize | Analyze | Share
  • 53. Storage Big Data Compute Where do you put your slice of it? Collection - Ingestion AWS Direct Connect AWS Import/Export Queuing Amazon Storage Gateway Dedicated bandwidth between Physical transfer of media Reliable messaging for task Shrink-wrapped gateway for you site and AWS into and out of AWS distribution & collection volume synchronization
  • 54. Storage Big Data Compute Where do you put your slice of it? Relational Database Service DynamoDB Simple Storage Service (S3) Fully managed database NoSQL, Schemaless, Object datastore up to 5TB per (MySQL, Oracle, MSSQL) Provisioned throughput object database 99.999999999% durability
  • 55. Storage Big Data Compute Where do you put your slice of it? Glacier Long term cold storage From $0.01 per GB/Month 99.999999999% durability
  • 56. Storage Big Data Compute Glacier - Full lifecycle big data management Data import Computation & Long term archive Visualization Once data analysis complete, Physical shipping of devices for HPC & EMR cluster jobs of many entire resultant dataset placed in creation of data in AWS thousands of cores cold storage rather than tape e.g. Cost effective when compared e.g. 50TB of Seismic data created e.g. 200TB of visualization data to tape, retrieval in 3-5 hours if as EBS volumes in a Gluster file generated from cluster processing required system
  • 57. Storage Big Data Compute How quick do you need to read it? Single digit ms 10s-100s ms <5 hours DynamoDB S3 Glacier Social scale applications Any object, any app Media & asset archives Provisioned throughput performance 99.999999999% durability Extremely low cost Flexible consistency models Objects up to 5TB in size S3 levels of durability Performance Scale Price
  • 58. Storage Big Data Compute Operate at any scale Unlimited data Performance Scale Price
  • 59. Storage Big Data Compute Pay for only what you use Provisioned IOPS Volume used Provisioned read/write performance Pay for volume stored per per Dynamo table/EBS volume month & puts/gets Pay for a given provisioned capacity No capacity planning required whether used or not to maintain unlimited storage Performance Scale Price
  • 60. Storage Big Data Compute “Big data” change the dynamics of computation and data sharing Collection Computation Collaboration How do I acquire it? What horsepower How do I work with Where do I put it? can I apply to it? others on it?
  • 61. Storage Big Data Compute “Big data” change the dynamics of computation and data sharing Collection Computation Collaboration How do I acquire it? What horsepower How do I work with Where do I put it? can I apply to it? others on it? Direct Connect EC2 Cloud Formation Import/Export GPUs Simple Workflow S3 Elastic Map Reduce S3 DynamoDB
  • 62.
  • 64. Storage Big Data Compute Hadoop-as-a-Service – Elastic MapReduce Elastic MapReduce Managed, elastic Hadoop cluster Integrates with S3 & DynamoDB Leverage Hive & Pig analytics scripts Integrates with instance types such as spot
  • 65. Elastic MapReduce Managed, elastic Hadoop cluster Integrates with S3 & DynamoDB Leverage Hive & Pig analytics scripts Integrates with instance types such as spot Feature Details Scalable Use as many or as few compute instances running Hadoop as you want. Modify the number of instances while your job flow is running Integrated with other Works seamlessly with S3 as origin and output. services Integrates with DynamoDB Comprehensive Supports languages such as Hive and Pig for defining analytics, and allows complex definitions in Cascading, Java, Ruby, Perl, Python, PHP, R, or C++ Cost effective Works with Spot instance types Monitoring Monitor job flows from with the management console
  • 66. But what is it?
  • 67. A framework Splits data into pieces Lets processing occur Gathers the results
  • 69. S3 + DynamoDB Input data Code Elastic MapReduce
  • 70. S3 + DynamoDB Input data Code Elastic Name MapReduce node
  • 71. S3 + DynamoDB Input data Code Elastic Name MapReduce node Elastic cluster
  • 72. S3 + DynamoDB Input data Code Elastic Name MapReduce node HDFS Elastic cluster
  • 73. S3 + DynamoDB Input data Code Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 74. S3 + DynamoDB Input data Code Elastic Name Output MapReduce node S3 + DynamoDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 75. S3 + DynamoDB Input data Output S3 + DynamoDB
  • 76. Very large click log (e.g TBs)
  • 77. Lots of actions by John Smith Very large click log (e.g TBs)
  • 78. Lots of actions by John Smith Very large click log (e.g TBs) Split the log into many small pieces
  • 79. Process in an EMR cluster Lots of actions by John Smith Very large click log (e.g TBs) Split the log into many small pieces
  • 80. Process in an EMR cluster Lots of actions by John Smith Very large click log (e.g TBs) Split the Aggregate log into the results many small from all pieces the nodes
  • 81. Process in an EMR cluster Lots of actions by John Smith What Very large John click log (e.g TBs) Smith Split the Aggregate log into the results did many small from all pieces the nodes
  • 82. What Very large John click log (e.g TBs) Smith Insight in a fraction of the time did
  • 83. 1 instance for 100 hours = 100 instances for 1 hour
  • 85. Operated 2 million+ Hadoop clusters last year
  • 86. Features powered by Amazon Elastic MapReduce: People Who Viewed this Also Viewed Review highlights Auto complete as you type on search Search spelling suggestions Top searches Ads 200 Elastic MapReduce jobs per day Processing 3TB of data
  • 87. Features driven by MapReduce
  • 88.
  • 89. Storage Big Data Compute Hadoop-as-a-Service – Elastic MapReduce "With Amazon Elastic MapReduce, there was no upfront investment in hardware, no hardware procurement delay, and no need to hire additional operations staff. Because of the flexibility of the platform, our first new online advertising campaign experienced a 500% increase in return on ad spend from a similar campaign a year before.”
  • 90. Data Analytics 3.5 billion records Execute batch processing data sets ranging in size from dozens of “Our first client 71 million unique cookies Gigabytes to Terabytes campaign experienced 1.7 million targeted ads a 500% increase in Building in-house infrastructure to required per day analyze these click stream datasets their return on ad requires investment in expensive spend from a similar “headroom” to handle peak demand. campaign a year before” User recently purchased a sports movie Targeted Ad and is searching for video games (1.7 Million per day)
  • 91.
  • 92. “AWS gave us the flexibility to bring a massive amount of capacity online in a short period of time and allowed us to do so in an operationally DynamoDB: straightforward way. over 500,000 writes per second AWS is now Shazam’s cloud provider of choice,” Amazon EMR: Jason Titus, more than 1 million writes CTO per second
  • 93.
  • 94.
  • 95.
  • 96.
  • 97. Step 1: Tracking Step 2: Panel Step 3: Dashboard We’ve created a unique tracking application. It We invite members of a research panel Usage data now begins to pour into the keeps track of all website visited, software used, to install it. We know not only their digital Wakoopa dashboard in real-time. Log in, and/or ads seen. habits, but also their offline and create beautiful visualizations and demographics and behavior. useful reports.
  • 98. Technology Panel AWS Activity SQS EMR RDS Data Kamek* Metrics S3 Wakoopa dashboard
  • 99.
  • 100. Rediff uses Amazon EMR along with Amazon S3 to perform data mining, log processing and analytics for their online business. Inputs gained are used to power a better user experience on their portal. Rediff needed 12-15 hours to run this on a 10-12 node cluster on premise. AWS gave choice and flexibility of an on demand model which can be scaled up and down and shortened the time required to process data.
  • 101.
  • 102. More than 25 Million Streaming Members 50 Billion Events Per Day
  • 103. S3 ~1 PB of data stored in Amazon S3
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Leader in 2011 Gartner IaaS Magic Quadrant
  • 118. Cloud enables big data collection
  • 119. Cloud enables big data processing
  • 120. Cloud enables big data collaboration
  • 121. aws.amazon.com get started with the free tier

Editor's Notes

  1. Elasticity works from just 1 EC2 instance to many thousands. Just dial up and down as required.
  2. Vertical scaling on commodity hardware. Perfect for Hadoop.
  3. Vertical scaling on commodity hardware. Perfect for Hadoop.
  4. Vertical scaling on commodity hardware. Perfect for Hadoop.
  5. Vertical scaling on commodity hardware. Perfect for Hadoop.
  6. We’ve been operating the service for over 3 years now and in the last year alone we’ve operated over 2 MILLION Hadoop clusters
  7. Yelp was founded in 2004 with the main goal of helping people connect with great local businesses. The Yelp community is best known for sharing in-depth reviews and insights on local businesses of every sort. In their six years of operation Yelp went from a one-city wonder (San Francisco) to an international phenomenon spanning 8 countries and nearly 50 cities. As of November 2010, Yelp had more than 39 million unique visitors to the site and in total, more than 14 million reviews have been posted by yelpers.Yelp has established a loyal consumer following, due in large part to the fact that they are vigilant in protecting the user from shill or suspect content. Yelp uses an automated review filter to identify suspicious content and minimize exposure to the consumer. The site also features a wide range of other features that help people discover new businesses (lists, special offers, and events), and communicate with each other. Additionally, business owners and managers are able to set up free accounts to post special offers, upload photos, and message customers.The company has also been focused on developing mobile apps and was recently voted into the iTunes Apps Hall of Fame. Yelp apps are also available for Android, Blackberry, Windows 7, Palm Pre and WAP.Local search advertising makes up the majority of Yelp’s revenue stream. The search ads are colored light orange and clearly labeled “Sponsored Results.” Paying advertisers are not allowed to change or re-order their reviews.Yelp originally depended upon giant RAIDs to store their logs, along with a single local instance of Hadoop. When Yelp made the move Amazon Elastic MapReduce, they replaced the RAIDs with Amazon Simple Storage Service (Amazon S3) and immediately transferred all Hadoop jobs to Amazon Elastic MapReduce.“We were running out of hard drive space and capacity on our Hadoop cluster,” says Yelp search and data-mining engineer Dave Marin.Yelp uses Amazon S3 to store daily logs and photos, generating around 100GB of logs per day. The company also uses Amazon Elastic MapReduce to power approximately 20 separate batch scripts, most of those processing the logs. Features powered by Amazon Elastic MapReduce include:People Who Viewed this Also ViewedReview highlightsAuto complete as you type on searchSearch spelling suggestionsTop searchesTheir jobs are written exclusively in Python, while Yelp uses their own open-source library, mrjob, to run their Hadoop streaming jobs on Amazon Elastic MapReduce, with boto to talk to Amazon S3. Yelp also uses s3cmd and the Ruby Elastic MapReduce utility for monitoring.Yelp developers advise others working with AWS to use the boto API as well as mrjob to ensure full utilization of Amazon Elastic MapReduce job flows. Yelp runs approximately 200 Elastic MapReduce jobs per day, processing 3TB of data and is grateful for AWS technical support that helped with their Hadoop application development.Using Amazon Elastic MapReduce Yelp was able to save $55,000 in upfront hardware costs and get up and running in a matter of days not months. However, most important to Yelp is the opportunity cost. “With AWS, our developers can now do things they couldn’t before,” says Marin. “Our systems team can focus their energies on other challenges.”
  8. The more misspelled words you collect from your customers, the better spellcheck application you can createYelp is using AWS services to regularly process customer generated data to improve spell check on their web site.
  9. The more searches you collect, the better recommendations you can provide.Yelp is using AWS services to deliver features such as hotel or restaurants recommendations, review highlights and search hints.
  10. AWS Case Study: RazorfishRazorfish, a digital advertising and marketing firm, segments users and customers based on the collection and analysis of non-personally identifiable data from browsing sessions. Doing so requires applying data mining methods across historical click streams to identify effective segmentation and categorization algorithms and techniques. These click streams are generated when a visitor navigates a web site or catalog, leaving behind patterns that can indicate a user’s interests. Algorithms are then implemented on systems that can batch execute at the appropriate scale against current data sets ranging in size from dozens of Gigabytes to Terabytes. The algorithms are also customized on a client-by-client basis to observe online/offline sales and customer loyalty data. Results of the analysis are loaded into ad-serving and cross-selling systems that in turn deliver the segmentation results in real time. A common issue Razorfish has found with customer segmentation is the need to process gigantic click stream data sets. These large data sets are often the result of holiday shopping traffic on a retail website, or sudden dramatic growth on the data network of a media or social networking site. Building in-house infrastructure to analyze these click stream datasets requires investment in expensive “headroom” to handle peak demand. Without the expensive computing resources, Razorfish risks losing clients that require Razorfish to have sufficient resources at hand during critical moments.In addition, applications that can’t scale to handle increasingly large datasets can cause delays in identifying and applying algorithms that could drive additional revenue. As the sample data set grows (i.e. more users, more pages, more clicks), fewer applications are available that can handle the load and provide a timely response. Meanwhile, as the number of clients that utilize targeted advertising grows, access to on-demand compute and storage resources becomes a requirement. It was thus imperative for Razorfish to implement customer segmentation algorithms in a way that could be applied and executed independently of the scale of the incoming data and supporting infrastructure.Prior to implementing the AWS based solution, Razorfish relied on a traditional hosting environment that utilized high-cost SAN equipment for storage, a proprietary distributed log processing cluster of 30 servers, and several high-end SQL servers. In preparation for the 2009 holiday season, demand for targeted advertising increased. To support this need, Razorfish faced a potential cost of over $500,000 in additional hardware expenses, a procurement time frame of about two months, and the need for an additional senior operations/database administrator. Furthermore, due to downstream dependencies, they needed their daily processing cycle to complete within 18 hours. However, given the increased data volume, Razorfish expected their processing cycle to extend past two days for each run even after the potential investment in human and computing resources.To deal with the combination of huge datasets and custom segmentation targeting activities, coupled with price sensitive clients, Razorfish decided to move away from their rigid data infrastructure status quo. This migration helped Razorfish process vast amounts of data to handle the need for rapid scaling at both the application and infrastructure levels. Razorfish selected Ad Serving integration, Amazon Web Services (AWS), Amazon Elastic MapReduce (a hosted Apache Hadoop service), Cascading, and a variety of chosen applications to power their targeted advertising system based on these benefits:Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of Hadoop clusters or the compute capacity upon which they sit.Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms.Flexible: Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms.Adaptable: Cascading simplifies the integration of Hadoop with external ad systems.Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets.The AWS elastic infrastructure platform allows Razorfish to manage wide variability in load by provisioning and removing capacity as needed. Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”
  11. Big data, the term for scanning loads of information for possibly profitable patterns, is a growing sector of corporate technology. Mostly people think in terms of online behavior, like mouse clicks, LinkedIn affiliations and Amazon shopping choices. But other big databases in the real world, lying around for years, are there to exploit.A company called the Climate Corporation was formed in 2006 by two former Google employees who wanted to make use of the vast amount of free data published by the National Weather Service on heat and precipitation patterns around the country. At first they called the company WeatherBill, and used the data to sell insurance to businesses that depended heavily on the weather, from ski resorts and miniature golf courses to house painters and farmers.It did pretty well, raising more than $50 million from the likes of Google Ventures, Khosla Ventures, and Allen &amp; Company. The problem was, it was hard to sell insurance policies to so many little businesses, even using an online shopping model. People like having their insurance explained. The answer was to get even more data, and focus on the agriculture market through the same sales force that sells federal crop insurance.“We took 60 years of crop yield data, and 14 terabytes of information on soil types, every two square miles for the United States, from the Department of Agriculture,” says David Friedberg, chief executive of the Climate Corporation, a name WeatherBill started using Tuesday. “We match that with the weather information for one million points the government scans with Doppler radar — this huge national infrastructure for storm warnings — and make predictions for the effect on corn, soybeans and winter wheat.”The product, insurance against things like drought, too much rain at the planting or the harvest, or an early freeze, is sold through 10,000 agents nationwide. The Climate Corporation, which also added Byron Dorgan, the former senator from North Dakota, to its board on Tuesday, will very likely get into insurance for specialty crops like tomatoes and grapes, which do not have federal insurance.Like the weather information, the data on soils was free for the taking. The hard and expensive part is turning the data into a product. Mr. Friedberg was an early member of the corporate development team at Google. The co-founder, SirajKhaliq, worked in distributed computing, which involves apportioning big data computing problems across multiple machines. He works as the Climate Corporation’s chief technical officer. Out of the staff of 60 in the company’s San Francisco office (another 30 work in the field) about 12 have doctorates, in areas like environmental science and applied mathematics.“They like that this is a real-world problem, not just clicks on a Web site,” Mr. Friedberg says.He figures that the Climate Corporation is one of the world’s largest users of MapReduce, an increasingly popular software technique for making sense of very large data systems. The number crunching is performed on Amazon.com’s Amazon Web Services computers.The Climate Corporation is working with data intended to judge how different crops will react to certain soils, water and heat. It might be valuable to commodities traders as well, but Mr. Friedberg figures the better business is to expand in farming. Besides the other crops, he is looking at offering the service in Canada and Brazil, or anywhere else that he can get decent long-term data. It’s unlikely he’ll get the quality he got from the federal government, for a price anywhere near “free.”The Climate CorporationKey TakeawaysCascading provides data scientists at The Climate Corporation a solid foundation to develop advanced machine learning applications in Cascalog that get deployed directly onto Amazon EMR clusters consisting of 2000+ cores. This results in significantly improved productivity with lower operating costs.SolutionData scientists at The Climate Corporation chose to create their algorithms in Cascalog, which is a high-level Clojure-based machine learning language built on Cascading. Cascading is an advanced Java application framework that abstracts the MapReduce APIs in Apache Hadoop and provides developers with a simplified way to create powerful data processing workflows. Programming in Cascalog, data scientists create compact expressions that represent complex batch-oriented AI and machine learning workflows. This results in improved productivity for the data scientists, many of whom are mathematicians rather than computer scientists. It also gives them the ability to quickly analyze complex data sets without having to create large complicated programs in MapReduce. Furthermore, programmers at The Climate Corporation also use Cascading directly for creating jobs inside Hadoop streaming to process additional batch-oriented data workflows.All these workflows and data processing jobs are deployed directly onto Amazon Elastic MapReduce into their own dedicated clusters. Depending on the size of data sets and the complexity of the algorithms, clusters consisting of up to 200 processor cores are utilized for data normalization workflows, and clusters consisting of over 2000 processor cores are utilized for risk analysis and climate modeling workflows.BenefitsBy utilizing Amazon Elastic MapReduce and Cascalog, data scientists at The Climate Corporation are able to focus on solving business challenges rather than worrying about setting up a complex infrastructure or trying to figure out how to use it to process the vast amounts of complex data.The Climate Corporation is able to effectively manage its costs by using Amazon Elastic MapReduce and using dedicated cluster resources for each workflow individually. This allows them to utilize the resources only when they are needed, and not have to invest in hardware resources and systems administrators to manage their own private shared cluster where they’d have to optimize their workflows and schedule them to avoid resource contention.Furthermore, Cascading provides data scientists at The Climate Corporation a common foundation for creating both their batch-oriented machine learning workflows in Cascalog, and Hadoop streaming workflows directly in Cascading. These applications are developed locally on the developers’ desktops, and then get instantly deployed onto dedicated Amazon Elastic MapReduce clusters for testing and production use. This minimizes the amount of iterative utilization of the cluster resources, thus allowing The Climate Corporation to manage its costs by utilizing the infrastructure for productive data processing only.
  12. In 2009, the company acquired Adtuitive, a startup Internet advertising company.Adtuitive’s ad server was completely hosted on Amazon Web Services and served targeted retail ads at a rate of over 100 million requests per month. Aduititve’s configuration included 50 Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (Amazon EBS) volumes, Amazon CloudFront, Amazon Simple Storage Service (Amazon S3), and a data warehouse pipeline built on Amazon ElasticMapReduce. Amazon Elastic MapReduce runs on a custom domain-specific language that uses the Cascading application programming interface.Today, Etsy uses Amazon Elastic MapReduce for web log analysis and recommendation algorithms. Because AWS easily and economically processes enormous amounts of data, it’s ideal for the type of processing that Etsy performs. Etsy copies its HTTP server logs every hour to Amazon S3, and syncs snapshots of the production database on a nightly basis. The combination of Amazon’s products and Etsy’s syncing/storage operation provides substantial benefits for Etsy. As Dr. Jason Davis, lead scientist at Etsy, explains, “the computing power available with [Amazon Elastic MapReduce] allows us to run these operations over dozens or even hundreds of machines without the need for owning the hardware.”elp was founded in 2004 with the main goal of helping people connect with great local businesses. The Yelp community is best known for sharing in-depth reviews and insights on local businesses of every sort. In their six years of operation Yelp went from a one-city wonder (San Francisco) to an international phenomenon spanning 8 countries and nearly 50 cities. As of November 2010, Yelp had more than 39 million unique visitors to the site and in total, more than 14 million reviews have been posted by yelpersYelp has established a loyal consumer following, due in large part to the fact that they are vigilant in protecting the user from shill or suspect content. Yelp uses an automated review filter to identify suspicious content and minimize exposure to the consumer. The site also features a wide range of other features that help people discover new businesses (lists, special offers, and events), and communicate with each other. Additionally, business owners and managers are able to set up free accounts to post special offers, upload photos, and message customers.The company has also been focused on developing mobile apps and was recently voted into the iTunes Apps Hall of Fame. Yelp apps are also available for Android, Blackberry, Windows 7, Palm Pre and WAP.Local search advertising makes up the majority of Yelp’s revenue stream. The search ads are colored light orange and clearly labeled “Sponsored Results.” Paying advertisers are not allowed to change or re-order their reviews.Yelp originally depended upon giant RAIDs to store their logs, along with a single local instance of Hadoop. When Yelp made the move Amazon Elastic MapReduce, they replaced the RAIDs with Amazon Simple Storage Service (Amazon S3) and immediately transferred all Hadoop jobs to Amazon Elastic MapReduce.“We were running out of hard drive space and capacity on our Hadoop cluster,” says Yelp search and data-mining engineer Dave Marin.Yelp uses Amazon S3 to store daily logs and photos, generating around 100GB of logs per day. The company also uses Amazon Elastic MapReduce to power approximately 20 separate batch scripts, most of those processing the logs. Features powered by Amazon Elastic MapReduce include:People Who Viewed this Also ViewedReview highlightsAuto complete as you type on searchSearch spelling suggestionsTop searchesAdsTheir jobs are written exclusively in Python, while Yelp uses their own open-source library, mrjob, to run their Hadoop streaming jobs on Amazon Elastic MapReduce, with boto to talk to Amazon S3. Yelp also uses s3cmd and the Ruby Elastic MapReduce utility for monitoring.Yelp developers advise others working with AWS to use the boto API as well as mrjob to ensure full utilization of Amazon Elastic MapReduce job flows. Yelp runs approximately 200 Elastic MapReduce jobs per day, processing 3TB of data and is grateful for AWS technical support that helped with their Hadoop application development.Using Amazon Elastic MapReduce Yelp was able to save $55,000 in upfront hardware costs and get up and running in a matter of days not months. However, most important to Yelp is the opportunity cost. “With AWS, our developers can now do things they couldn’t before,” says Marin. “Our systems team can focus their energies on other challenges.”To learn more, visit http://www.yelp.com/ . To learn about the mrjob Python library, visit http://engineeringblog.yelp.com/2010/10/mrjob-distributed-computing-for-everybody.html
  13. “Wakoopa understands what people do in their digital lives. In a privacy conscious way, our technology tracks what websites they visit, what ads they see, or what apps they use. By using our online research dashboard, you can optimize your your digital strategy accordingly. Our clients include research firms such as TNS and Synovate, to companies like Google and Sanoma. Essentially, we’re the Lonely Planet of the digital world.
  14. Kamek is a server created by Wakoopa that makes metrics (such as bounce-rate or pageviews) out of millions of visits and visitors, all in a couple of seconds, all in real-time.
  15. Netflix hasmore than 25 million streaming members and is growing rapidly. Their end users stream movies and TV shows from smart TV’s, laptops, phones, and tablets, resulting in over 50 billion events per day.
  16. Netflix stores all of this data in Amazon S3, approximately 1 Petabyte.
  17. AWS Case Study: Ticketmaster and MarketShareThe Business ChallengesThe Pricemaster application is a web-based tool designed to optimize live event ticket pricing, improve yield management and generate incremental revenue. The tool takes a holistic approach to maximizing ticket revenue: it optimizes pre-sale and initial pricing all the way through dynamic pricing post on-sale.However, before development could begin, MarketShare had to find an infrastructure that could support the application’s dual challenges: limited upfront capital and managing the fluctuating nature of analytic workloads.Amazon Web ServicesAfter examining their options, MarketShare decided to power Pricemaster using Amazon Web Services (AWS). The AWS feature stack provides the scalability, usability, and on-demand pricing required to support the application’s intricate cluster architecture and complex MATLAB simulations.Pricemaster’s AWS environment includes four large and extra large Amazon EC2 instances supporting a variety of nodes. The diagram below reveals the Amazon EC2 configuration:The pricing application’s Amazon EC2 instances are connected to a central database within Amazon Amazon RDS. In addition, Pricemaster’s AWS infrastructure includes Amazon ELB for traffic distribution, Amazon SimpleDB for non-relational data storage, Amazon Elastic MapReduce for large-scale data processing, as well as Amazon SES. The Pricemaster team monitors all of these resources with Amazon CloudWatch.The diagram below details the application’s AWS-based architecture.The Business BenefitsThe Pricemaster team credits AWS’s ease of use, specifically that of Amazon Elastic MapReduce and Amazon RDS, with reducing its developers’ infrastructure management time by three hours per day—valuable hours the developers can now spend expanding the capabilities of the Pricemaster solution.With AWS’s on-demand pricing, MarketShare also estimates that it reduces costs by over 80% annually, compared to fixed service costs. As the Pricemaster tool continues to grow, the company anticipates even further savings with Amazon Web Services.MarketShare continues to expand its use of AWS for partners such as Ticketmaster saving time, money and providing a superior solution that is flexible, secure and scalable.
  18. For example, one of our customers, FourSquare, has built this visualization of customer sign-ups From November of 2008 to June of 2011. this visualization helps understand global service adoption over time. You can create similar visualizations with packages such as gplot or R graphics package.