SlideShare a Scribd company logo
1 of 17
Analytics in the Cloud
       Peter Sirota, GM Elastic MapReduce
Data-Driven Decision Making

Data is the new raw
material for any
business on par with
capital, people, and
labor.
What is Big Data?
  Terabytes of semi-structured log data
  in which businesses want to:
   find correlations/perform pattern matching
   generate recommendations
   calculate advanced statistics (i.e., TP99)

  Twitter “Firehose”
   50 million tweets per day
   1,400% growth per year
   How can advertisers drink from it?

  Social graphs
   Value increases with exponential
    growth in data connections

Big Data is full of valuable, unanswered questions!
Why is Big Data Hard (and Getting Harder)?
  Today’s Data Warehouses
     Need to consolidate from multiple data sources in multiple formats across
      multiple businesses
     Unconstrained growth of this business-critical information

  Today’s Users
     Expect faster response time of fresher data
     Sampling is not good enough and history is important
     Demand inexpensive experimentation with new data
     Become increasingly sophisticated Data Scientists

  Current systems don’t scale (and weren’t meant to)
     Long time to provision more infrastructure
     Specialized DB expertise required
     Expensive and inelastic solutions


We need tools built specifically for Big Data!
What is this thing called Hadoop?
Dealing with Big Data requires two things:
  Distributed, scalable storage
  Inexpensive, flexible analytics

Apache Hadoop is an open source software
platform that addresses both of these needs
  Includes a fault‐tolerant, distributed storage system
   (HDFS) developed for commodity servers
  Uses a technique called MapReduce to carry out
   exhaustive analysis over huge distributed data sets

Key benefits
 Affordable – Cost / TB is a fraction of traditional options
 Proven at scale – Numerous petabyte implementations in production;
  linear scalability
 Flexible – Data can be stored with or without schema
RDBMS vs. MapReduce/Hadoop

RDBMS                                   MapReduce/Hadoop
 Predefined schema                      No schema is required
 Strategic data placement for query     Random data placement
  tuning                                 Fast scan of the entire dataset
 Exploit indexes for fast retrieving    Uniform query performance
 SQL only                               Linearly scales for reads and
 Doesn’t scale linearly                  writes
                                         Support many languages
                                          including SQL



            Complementary technologies
Why Amazon Elastic MapReduce?
Managed Apache Hadoop Web Service
  Monitor thousands of clusters per day
  Use cases span from University students to Fortune 50

Reduces complexity of Hadoop management
  Handles node provisioning, customization, and shutdown
  Tunes Hadoop to your hardware and network
  Provides tools to debug and monitor your Hadoop clusters

Provides tight integration with AWS services
    Improved performance working with S3
    Automatic re-provisioning on node failure
    Dynamic expanding/shrinking of cluster size
    Spot integration
Elastic MapReduce Key Features
Simplified Cluster Configuration/Management
    Resize running job flows
    Support for EIP/IAM/Tagging
    Workload-specific configurations
    Bootstrap Actions

Enhanced Monitoring/Debugging
  Free CloudWatch Metrics / Alarms
  Hadoop Metrics in Console
  Ganglia Support

Improved Performance
  S3 Multipart Upload
  Cluster Compute Instances
Analytics Use Cases
Targeted advertising / Clickstream analysis
Data warehousing applications
Bio-informatics (Genome analysis)
Financial simulation (Monte Carlo simulation)
File processing (resize jpegs)
Web indexing
Data mining and BI
APACHE H IVE
DATA WAREHOUSE FOR H ADOOP
 Open source project started at Facebook
 Turns data on Hadoop into a virtually limitless
 data warehouse
 Provides data summarization, ad hoc querying
 and analysis
 Enables SQL-like queries on structured and
 unstructured data
   E.g. arbitrary field separators possible such as “,” in
    CSV file formats
 Inherits linear scalability of Hadoop
AWS Data Warehousing Architecture
Elastic Data Warehouse
 Customize cluster size to support varying resource needs
 (e.g. query support during the day versus batch processing
 overnight)
 Reduce costs by increasing server utilization
 Improve performance during high usage periods

                                   Data Warehouse
                                  (Batch Processing)
 Data Warehouse                                                      Data Warehouse
  (Steady State)                                                      (Steady State)

                                                        Shrink to
                    Expand to                          9 instances
                   25 instances
Reducing Costs with Spot Instances
Mix Spot and On-Demand instances to reduce cost and
accelerate computation while protecting against interruption

   Scenario #1           Scenario #2    #1: Cost without Spot
                           Job Flow     4 instances *14 hrs * $0.50 = $28
     Job Flow
                                        #2: Cost with Spot
                                        4 instances *7 hrs * $0.50 = $13 +
                                        5 instances * 7 hrs * $0.25 = $8.75
                                        Total = $21.75

      Duration:             Duration:
     14 Hours
                                        Time Savings: 50%
                            7 Hours     Cost Savings: ~22%


Other EMR + Spot Use Cases
Run entire cluster on Spot for biggest cost savings
Reduce the cost of application testing
Monitoring Clusters with CloudWatch
 Free CloudWatch Metrics and Alarms
   Track Hadoop job progress
   Alarm on degradations in cluster health
   Monitor aggregate Elastic MapReduce usage
Big Data Ecosystem And Tools
We have a rapidly growing ecosystem and will continue
to integrate with a wide range of partners. Some
examples:

  Business Intelligence
    MicroStrategy, Pentaho
  Analytics
    Datameer, Karmasphere, Quest
  Open source
    Ganglia, SQuirrel SQL
Resources
Amazon Elastic MapReduce
  aws.amazon.com/elasticmapreduce
  aws.amazon.com/articles/Elastic-MapReduce
  forums.aws.amazon.com/forum.jspa?forumID=52

More Related Content

What's hot

(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...Amazon Web Services
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSAmazon Web Services
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopDataWorks Summit
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...Amazon Web Services
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & ArchitectureAnjani Phuyal
 
Aaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudAaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudGanesh Raja
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Amazon Web Services
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Andy Wright
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...Rodney Joyce
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudDataWorks Summit
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAmazon Web Services
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 

What's hot (20)

(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWS
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Aaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudAaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloud
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the Cloud
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data Analytics
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 

Viewers also liked

Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsPlay’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsLuigi Buglione
 
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAmazon Web Services
 
Cloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudCloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudAmazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
AWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAmazon Web Services
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon Web Services
 
Using raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepUsing raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepTao Tang-Little
 
AWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAmazon Web Services
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatNiek Temme
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析Simon Su
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMarco Brambilla
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehousedrluckyspin
 
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...Amazon Web Services
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?Amazon Web Services
 
Docker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiDocker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiPhilip Zheng
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)Amazon Web Services
 

Viewers also liked (19)

Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsPlay’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
 
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
 
Cloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudCloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the Cloud
 
Cloud-Powered Social Gaming
Cloud-Powered Social GamingCloud-Powered Social Gaming
Cloud-Powered Social Gaming
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
AWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAWS Partner Presentation - Riverbed
AWS Partner Presentation - Riverbed
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk Introduction
 
Using raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepUsing raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleep
 
AWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the Cloud
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostat
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di business
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehouse
 
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?
 
Docker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiDocker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry Pi
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)
 

Similar to AWS Summit 2011: Big Data Analytics in the AWS cloud

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...Yahoo Developer Network
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 

Similar to AWS Summit 2011: Big Data Analytics in the AWS cloud (20)

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Final deck
Final deckFinal deck
Final deck
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

AWS Summit 2011: Big Data Analytics in the AWS cloud

  • 1. Analytics in the Cloud Peter Sirota, GM Elastic MapReduce
  • 2. Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor.
  • 3. What is Big Data? Terabytes of semi-structured log data in which businesses want to:  find correlations/perform pattern matching  generate recommendations  calculate advanced statistics (i.e., TP99) Twitter “Firehose”  50 million tweets per day  1,400% growth per year  How can advertisers drink from it? Social graphs  Value increases with exponential growth in data connections Big Data is full of valuable, unanswered questions!
  • 4. Why is Big Data Hard (and Getting Harder)? Today’s Data Warehouses  Need to consolidate from multiple data sources in multiple formats across multiple businesses  Unconstrained growth of this business-critical information Today’s Users  Expect faster response time of fresher data  Sampling is not good enough and history is important  Demand inexpensive experimentation with new data  Become increasingly sophisticated Data Scientists Current systems don’t scale (and weren’t meant to)  Long time to provision more infrastructure  Specialized DB expertise required  Expensive and inelastic solutions We need tools built specifically for Big Data!
  • 5. What is this thing called Hadoop? Dealing with Big Data requires two things:  Distributed, scalable storage  Inexpensive, flexible analytics Apache Hadoop is an open source software platform that addresses both of these needs  Includes a fault‐tolerant, distributed storage system (HDFS) developed for commodity servers  Uses a technique called MapReduce to carry out exhaustive analysis over huge distributed data sets Key benefits  Affordable – Cost / TB is a fraction of traditional options  Proven at scale – Numerous petabyte implementations in production; linear scalability  Flexible – Data can be stored with or without schema
  • 6. RDBMS vs. MapReduce/Hadoop RDBMS MapReduce/Hadoop  Predefined schema  No schema is required  Strategic data placement for query  Random data placement tuning  Fast scan of the entire dataset  Exploit indexes for fast retrieving  Uniform query performance  SQL only  Linearly scales for reads and  Doesn’t scale linearly writes  Support many languages including SQL Complementary technologies
  • 7.
  • 8. Why Amazon Elastic MapReduce? Managed Apache Hadoop Web Service  Monitor thousands of clusters per day  Use cases span from University students to Fortune 50 Reduces complexity of Hadoop management  Handles node provisioning, customization, and shutdown  Tunes Hadoop to your hardware and network  Provides tools to debug and monitor your Hadoop clusters Provides tight integration with AWS services  Improved performance working with S3  Automatic re-provisioning on node failure  Dynamic expanding/shrinking of cluster size  Spot integration
  • 9. Elastic MapReduce Key Features Simplified Cluster Configuration/Management  Resize running job flows  Support for EIP/IAM/Tagging  Workload-specific configurations  Bootstrap Actions Enhanced Monitoring/Debugging  Free CloudWatch Metrics / Alarms  Hadoop Metrics in Console  Ganglia Support Improved Performance  S3 Multipart Upload  Cluster Compute Instances
  • 10. Analytics Use Cases Targeted advertising / Clickstream analysis Data warehousing applications Bio-informatics (Genome analysis) Financial simulation (Monte Carlo simulation) File processing (resize jpegs) Web indexing Data mining and BI
  • 11. APACHE H IVE DATA WAREHOUSE FOR H ADOOP Open source project started at Facebook Turns data on Hadoop into a virtually limitless data warehouse Provides data summarization, ad hoc querying and analysis Enables SQL-like queries on structured and unstructured data  E.g. arbitrary field separators possible such as “,” in CSV file formats Inherits linear scalability of Hadoop
  • 12. AWS Data Warehousing Architecture
  • 13. Elastic Data Warehouse Customize cluster size to support varying resource needs (e.g. query support during the day versus batch processing overnight) Reduce costs by increasing server utilization Improve performance during high usage periods Data Warehouse (Batch Processing) Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  • 14. Reducing Costs with Spot Instances Mix Spot and On-Demand instances to reduce cost and accelerate computation while protecting against interruption Scenario #1 Scenario #2 #1: Cost without Spot Job Flow 4 instances *14 hrs * $0.50 = $28 Job Flow #2: Cost with Spot 4 instances *7 hrs * $0.50 = $13 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Duration: Duration: 14 Hours Time Savings: 50% 7 Hours Cost Savings: ~22% Other EMR + Spot Use Cases Run entire cluster on Spot for biggest cost savings Reduce the cost of application testing
  • 15. Monitoring Clusters with CloudWatch Free CloudWatch Metrics and Alarms  Track Hadoop job progress  Alarm on degradations in cluster health  Monitor aggregate Elastic MapReduce usage
  • 16. Big Data Ecosystem And Tools We have a rapidly growing ecosystem and will continue to integrate with a wide range of partners. Some examples: Business Intelligence  MicroStrategy, Pentaho Analytics  Datameer, Karmasphere, Quest Open source  Ganglia, SQuirrel SQL
  • 17. Resources Amazon Elastic MapReduce aws.amazon.com/elasticmapreduce aws.amazon.com/articles/Elastic-MapReduce forums.aws.amazon.com/forum.jspa?forumID=52