SlideShare a Scribd company logo
1 of 55
Download to read offline
Hadoop in the Cloud: Unlocking the Potential of
Big Data on AWS
Introducing
Maya Cabassi
Partner Marketing Manager
Amazon Web Services
Webinar Overview
 Submit Your Questions using the Q&A tool.
 A copy of today’s presentation will be made available on:
 AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/
 AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCT-nPlVzJI-
ccQXlxjSvJmw
Introducing
Jonathan Fritz
Sr. Product Manager
Amazon Web Services
Steve Wooledge
VP, Product Marketing
MapR Technologies
Bruce Penn
Principal Sales Engineer
MapR Technologies
What We’ll Cover
• Elastic MapReduce (EMR): Hadoop in the cloud
• Elastic clusters tailored for your workflows
• Best container to run Hadoop in the AWS Ecosystem
• Introduction to MapR’s Hadoop Platform
• Defining feature
• Increased performance
• Case Studies: MapR + Elastic MapReduce
• Q&A
Hadoop in the Cloud
Using MapR and Amazon Elastic MapReduce to unlock Big Data
Jonathan Fritz, Sr. Product Manager, Amazon Web Services
Steve Wooledge, VP, Product Marketing, MapR Technologies
Agenda
• Elastic MapReduce (EMR): Hadoop in the cloud
– Elastic clusters tailored for your workflows
– Best container to run Hadoop in the AWS Ecosystem
• Introduction to MapR’s Hadoop Platform
– Defining features
– Increased performance
• Case Studies: MapR + Elastic MapReduce
• Q+A
• YouTube users upload 48 hours of new video/min/day
• Twitter sees roughly 175 million tweets every day
The Three V’s: the drivers behind Big Data
Variety
Velocity
Volume
• Facebook analyzes 30+ petabytes of user generated data
• More than 5 billion people are calling, texting, tweeting and
browsing on mobile phones worldwide
• 2.7 zetabyes data exist in the digital universe today.
• Data production will be 44 times greater in 2020 vs. 2009
Hadoop is the right system for Big Data
• Scalable and fault tolerant
• Flexibility for multiple languages
and data formats
• Open source
• Ecosystem of tools
• Batch and real-time analytics
Challenges with managing Hadoop
On-Premises
• Manage HDFS, upgrades,
and system administration
• Pay for expensive support
contracts
• Select hardware in
advance and stick with
predictions
Cloud
• Hard to tightly integrate
with AWS storage services
• Independently manage
and monitor clusters
Amazon Elastic MapReduce (EMR) is the
easiest way to run Hadoop in the cloud.
• Managed services
• Easy to tune clusters and trim costs
• Support for multiple AWS datastores
• Unique features and ecosystem support
Why Amazon Elastic MapReduce?
Input data
S3, DynamoDB, Redshift
Elastic
MapReduce
Code
Input data
S3, DynamoDB, Redshift
Elastic
MapReduce
Code Name
node
Input data
S3, DynamoDB, Redshift
Elastic
MapReduce
Code Name
node
Input data
Elastic
cluster
S3, DynamoDB, Redshift
S3/HDFS
Elastic
MapReduce
Code Name
node
Input data
S3/HDFS
Queries
+ BI
Via JDBC, Pig, Hive
S3, DynamoDB, Redshift
Elastic
cluster
Elastic
MapReduce
Code Name
node
Output
Input data
Queries
+ BI
Via JDBC, Pig, Hive
S3, DynamoDB, Redshift
Elastic
cluster
S3/HDFS
Output
Input data
S3, DynamoDB, Redshift
Elastic clusters.
Customize size and type to reduce costs.
Choose your instance types
Try out different configurations to find your
optimal architecture.
CPU
c1.xlarge
cc1.4xlarge
cc2.8xlarge
Memory
m1.large
m2.2xlarge
m2.4xlarge
Disk
hs1.8xlarge
Long running or transient clusters
Easy to run Hadoop clusters short-term or 24/7, and
only pay for what you need.
=
10 hours
Resizable clusters
Easy to add and remove compute
capacity on your cluster.
6 hours
Resizable clusters
Easy to add and remove compute
capacity on your cluster.
Peak capacity
Resizable clusters
Easy to add and remove compute
capacity on your cluster.
Matched compute
demands with cluster sizing.
Resizable clusters
Easy to add and remove compute
capacity on your cluster.
10 hours
Use Spot and Reserved Instances.
Minimize costs by supplementing on-demand pricing.
Easy to use Spot Instances
Name-your-price supercomputing to minimize costs.
Spot for
task nodes
Up to 90%
off EC2
on-demand
pricing
On-demand for
core nodes
Standard EC2
pricing for
on-demand
capacity.
24/7 clusters on Reserved Instances
Minimize cost for consistent capacity.
Reserved
Instances for
long running
clusters.
Up to 65% off
on-demand
pricing.
Your data, your choice.
Easy to integrate Elastic MapReduce with your datastores.
Using Amazon S3 and On-Cluster Storage
Data Sources
Transient EMR cluster
for batch map/reduce jobs
for daily reports
Long running EMR cluster
holding data on the cluster
in a NoSQL database
Weekly Report
Ad-hoc Query
Data aggregated
and stored in
Amazon S3
Use Amazon EMR with Redshift and S3
Data Sources
Daily data
aggregated in
Amazon S3
Amazon EMR
cluster used to
process data
Processed data
loaded into
Amazon Redshift
data warehouse
© 2014 MapR Technologies 34© 2014 MapR Technologies
Introduction to MapR
© 2014 MapR Technologies 35
MAPR: WORLDWIDEHADOOPTECHNOLOGYLEADER
UNIQUELYADDRESSESBOTH
ANALYTICANDOPERATIONALUSECASES
500+PAYINGCUSTOMERS
HQ
© 2014 MapR Technologies 36
Hadoop Distributions
Open Source Open Source
Distribution A Distribution B
MANAGEMENT
Open Source
MANAGEMENT
ARCHITECTURAL
INNOVATIONS
© 2014 MapR Technologies 37
Management
MapR Data Platform
APACHE HADOOP & OSS ECOSYSTEM
Impala SharkHivePigHueOozieZooKeeper
Mahout MLLibJujuSolrCascadingHttpFSFlume
Storm
Spark
Streaming
YARNMapReduceHBaseWhirrSqoop
Drill Tez
Knox Sentry
Spark Falcon
• High availability
• Data protection
• Disaster recovery
• Standard file
access
• Standard database
access
• Pluggable services
• Broad developer
support
• Enterprise security
authorization
• Wire-level
authentication
• Data governance
• Ability to support
predictive
analytics, real-time
database
operations, and
support high arrival
rate data
• Ability to logically
divide a cluster to
support different
use cases, job
types, user groups,
and administrators
• 2X to 7X higher
performance
• Consistent, low
latency
MapR Distribution for Hadoop
Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
© 2014 MapR Technologies 38© 2014 MapR Technologies
A winning combination:
MapR with Amazon Elastic MapReduce
© 2014 MapR Technologies 39
Launching a Cluster
MapR Option Integrated within EMR
© 2014 MapR Technologies 40
MapR: Designed for Both Transient and Long-Term Clusters
• High Availability
• Easy Development
• Multi-Tenancy
• World-Record Performance
• Breadth of Applications
Fastest On-Ramp to
Develop Hadoop
Applications
Best Platform for
Long-Term Hadoop
Production Success
© 2014 MapR Technologies 41
Resource Manager HA,
Application Master HA
JobTracker HA for MRv1
NFS HA
Instant recovery
• YARN jobs are not impacted by failures
• Continue to meet SLAs with MapReduce v2
• MapReduce v1 jobs are not impacted by failures
• Meet your data processing SLAs
• High throughput and resilience for NFS-based data
ingestion, import/export and multi-client access
• Files and tables are accessible within seconds of a node
failure or cluster restart
High Availability (HA) For Hadoop
No-NameNode architecture
• Distributed metadata can self-heal
• No practical limit on # of files
© 2014 MapR Technologies 42
Direct Integration with Existing Applications
• 100% POSIX compliant
• Industry standard APIs
- NFS, ODBC, LDAP, REST
• More 3rd-party solutions
• No proprietary connectors
required
• Language neutral
© 2014 MapR Technologies 43
Multi-Tenancy Support for ParallelizedApp Development
Isolation
• Tasks sandboxed so they don’t impact other tasks or system daemons
• System resources protected from runaway jobs
• Volume-based data segregation based on users and groups
• Volume-based data placement to control
• Label-based job scheduling to control
Quotas
• Storage quotas by volume/user/group
• CPU and memory quotas by queue/user/group
Security and delegation
• Fine-grained administration permissions including volume-level delegation
• Authenticate users to AD, LDAP and Kerberos via Linux PAM
Reporting
• Detailed reporting on resource usage (75+ different metrics)
• All reports are available via UI, CLI and REST API
© 2014 MapR Technologies 44
MapR M7: The Best In-Hadoop NoSQL Database
Benefit Features
High Performance Over 1 million ops/sec with 10 node cluster
Continuous Low Latency No I/O storms, no compactions
24x7 Applications
Instant recovery, online schema modification, snapshots,
mirroring
Zero Administration No processes to manage, automated splits, self-tuning
High Scalability 1 trillion tables, billions of rows, millions of columns
Low TCO
Files and tables on one platform, more work with fewer
nodes
Performance
Reliability
Easy
Administration
© 2014 MapR Technologies 45
425
925
333
563
367
532
163
331
IDH 2.4.1
CDH 4.3
Source: Flux7 Labs Study, October 2013
Flux7: Comparative Study of Hadoop Distributions
Web Search and Data Analytics Benchmarks
Page Rank Hive JOIN Query
Timeinseconds
Timein
Seconds
Lower is Better
Hardware Specs: EC2 on AWS
1 Master: m1.xlarge; 64-bit; 4 vCPU, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel ® Xeon ® CPU E5-2650 0 @ 2.00 GHz
4 Slaves: m1.large; 64-bit; 2 vCPU, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel ® Xeon ® CPU E5430 @ 2.66 GHz
© 2014 MapR Technologies 46
Comparative Study of Hadoop Distributions
212
59
262
69
276
64
475 465 IDH
CDH
HDP
MapR
Source: Flux7 Labs Study, October 2013
http://flux7.com/blogs/case-studies/hadoop-distributions-a-detailed-comparative-study-whitepaper/
Read and Write Throughput Benchmarks
DFSIO Read Throughput DFSIO Write Throughput
MBperSecond
MBperSecond
Hardware Specs: EC2 on AWS
1 Master: m1.xlarge; 64-bit; 4 vCPU, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel ® Xeon ® CPU E5-2650 0 @ 2.00 GHz
4 Slaves: m1.large; 64-bit; 2 vCPU, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel ® Xeon ® CPU E5430 @ 2.66 GHz
Higher is Better
© 2014 MapR Technologies 47
MapR M7: The Best In-Hadoop Database
 NoSQL Columnar Store
 Apache HBase API
 Integrated with Hadoop
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
MapR M7
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2014 MapR Technologies 48© 2014 MapR Technologies
Customer Case Studies
MapR with Amazon Elastic MapReduce in Action
© 2014 MapR Technologies 49
Use cases for MapR with Amazon EMR
• Targeted advertising / clickstream analysis
• Security: anti-virus, fraud detection, image
recognition
• Pattern matching / recommendations
• Reporting / BI
• Bio-informatics (genome analysis)
• Financial simulation (Monte Carlo simulation)
• File processing (resize jpegs, video encoding)
• Web indexing
© 2014 MapR Technologies 50
Case Study
Outcomes from MapR Deployment w/ EMR
• Increased flexibility to scale at lower costs
• Faster turnaround for customer requests
• Ease of experimentation
Challenges
• RDBMS on AWS too slow
• Solution must be compatible with AWS & Java 7
• High performance
© 2014 MapR Technologies 51
Case Study
Outcomes from MapR Deployment w/ EMR
• Faster machine learning performance
enables more/faster simulations
• MapR M7 provides geospatial database
backed by Amazon S3
Challenges
• Large volumes of sensor data
• Project weather for 2.5 years
at every 20x20 plot across the US
• Climatology simulations need to quickly
experiment at small scale and then scale reliably
© 2014 MapR Technologies 52© 2014 MapR Technologies
Demo
© 2014 MapR Technologies 53
MapR/EMR Demonstration
• Create MapR cluster using EMR
• Review MapR Control System (MCS)
• Show S3 and MapR integration
• Demonstrate MapR’s real-time capability
• Connect Mac to MapR via NFS
• Run queries with HiveServer2 and Impala
• Visualize data with Tableau
Questions and Contact
MapR:
http://aws.amazon.com/elasticmapreduce/mapr/
swooledge@mapr.com
AWS Contact:
aws.amazon.com/contact-us
jonfritz@amazon.com
@mapr
@awscloud
Maprtech
Amazon Web Services
We’d like your feedback.
Please complete a short survey
https://aws.asia.qualtrics.com/SE/?SID=SV_brzWly
lHrqM29tr

More Related Content

What's hot

The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData StoryLynn Langit
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)Timothy McAliley
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articlesKaran Gulati
 
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...John Chang
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioAlluxio, Inc.
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VMJames Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Databricks
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNicholas Vossburg
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Erwin de Kreuk
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSMesosphere Inc.
 

What's hot (20)

The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
 
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...AI for Intelligent Cloud and Intelligent Edge:Discover, Deploy, and Manage w...
AI for Intelligent Cloud and Intelligent Edge: Discover, Deploy, and Manage w...
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch Deck
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
 

Viewers also liked

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
AWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAmazon Web Services
 
AWS Large Scale Migrations - Jan 2016
AWS Large Scale Migrations - Jan 2016AWS Large Scale Migrations - Jan 2016
AWS Large Scale Migrations - Jan 2016Amazon Web Services
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCAmazon Web Services
 
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)Amazon Web Services
 
AWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAmazon Web Services
 

Viewers also liked (7)

Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
AWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWSAWS Webcast - Datacenter Migration to AWS
AWS Webcast - Datacenter Migration to AWS
 
AWS Large Scale Migrations - Jan 2016
AWS Large Scale Migrations - Jan 2016AWS Large Scale Migrations - Jan 2016
AWS Large Scale Migrations - Jan 2016
 
IT Transformation with AWS
IT Transformation with AWSIT Transformation with AWS
IT Transformation with AWS
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
 
AWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS CloudAWS Partner Webcast - Data Center Migration to the AWS Cloud
AWS Partner Webcast - Data Center Migration to the AWS Cloud
 

Similar to Hadoop in the Cloud: Unlocking Big Data Potential with MapR and AWS

Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesDataWorks Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1Milind gunjan
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 

Similar to Hadoop in the Cloud: Unlocking Big Data Potential with MapR and AWS (20)

Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Hadoop in the Cloud: Unlocking Big Data Potential with MapR and AWS

  • 1. Hadoop in the Cloud: Unlocking the Potential of Big Data on AWS
  • 2. Introducing Maya Cabassi Partner Marketing Manager Amazon Web Services
  • 3. Webinar Overview  Submit Your Questions using the Q&A tool.  A copy of today’s presentation will be made available on:  AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/  AWS Webinar Channel on YouTube@ http://www.youtube.com/channel/UCT-nPlVzJI- ccQXlxjSvJmw
  • 4. Introducing Jonathan Fritz Sr. Product Manager Amazon Web Services Steve Wooledge VP, Product Marketing MapR Technologies Bruce Penn Principal Sales Engineer MapR Technologies
  • 5. What We’ll Cover • Elastic MapReduce (EMR): Hadoop in the cloud • Elastic clusters tailored for your workflows • Best container to run Hadoop in the AWS Ecosystem • Introduction to MapR’s Hadoop Platform • Defining feature • Increased performance • Case Studies: MapR + Elastic MapReduce • Q&A
  • 6. Hadoop in the Cloud Using MapR and Amazon Elastic MapReduce to unlock Big Data Jonathan Fritz, Sr. Product Manager, Amazon Web Services Steve Wooledge, VP, Product Marketing, MapR Technologies
  • 7. Agenda • Elastic MapReduce (EMR): Hadoop in the cloud – Elastic clusters tailored for your workflows – Best container to run Hadoop in the AWS Ecosystem • Introduction to MapR’s Hadoop Platform – Defining features – Increased performance • Case Studies: MapR + Elastic MapReduce • Q+A
  • 8. • YouTube users upload 48 hours of new video/min/day • Twitter sees roughly 175 million tweets every day The Three V’s: the drivers behind Big Data Variety Velocity Volume • Facebook analyzes 30+ petabytes of user generated data • More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide • 2.7 zetabyes data exist in the digital universe today. • Data production will be 44 times greater in 2020 vs. 2009
  • 9. Hadoop is the right system for Big Data • Scalable and fault tolerant • Flexibility for multiple languages and data formats • Open source • Ecosystem of tools • Batch and real-time analytics
  • 10. Challenges with managing Hadoop On-Premises • Manage HDFS, upgrades, and system administration • Pay for expensive support contracts • Select hardware in advance and stick with predictions Cloud • Hard to tightly integrate with AWS storage services • Independently manage and monitor clusters
  • 11. Amazon Elastic MapReduce (EMR) is the easiest way to run Hadoop in the cloud.
  • 12. • Managed services • Easy to tune clusters and trim costs • Support for multiple AWS datastores • Unique features and ecosystem support Why Amazon Elastic MapReduce?
  • 17. Elastic MapReduce Code Name node Input data S3/HDFS Queries + BI Via JDBC, Pig, Hive S3, DynamoDB, Redshift Elastic cluster
  • 18. Elastic MapReduce Code Name node Output Input data Queries + BI Via JDBC, Pig, Hive S3, DynamoDB, Redshift Elastic cluster S3/HDFS
  • 20. Elastic clusters. Customize size and type to reduce costs.
  • 21. Choose your instance types Try out different configurations to find your optimal architecture. CPU c1.xlarge cc1.4xlarge cc2.8xlarge Memory m1.large m2.2xlarge m2.4xlarge Disk hs1.8xlarge
  • 22. Long running or transient clusters Easy to run Hadoop clusters short-term or 24/7, and only pay for what you need. =
  • 23. 10 hours Resizable clusters Easy to add and remove compute capacity on your cluster.
  • 24. 6 hours Resizable clusters Easy to add and remove compute capacity on your cluster.
  • 25. Peak capacity Resizable clusters Easy to add and remove compute capacity on your cluster.
  • 26. Matched compute demands with cluster sizing. Resizable clusters Easy to add and remove compute capacity on your cluster. 10 hours
  • 27. Use Spot and Reserved Instances. Minimize costs by supplementing on-demand pricing.
  • 28. Easy to use Spot Instances Name-your-price supercomputing to minimize costs. Spot for task nodes Up to 90% off EC2 on-demand pricing On-demand for core nodes Standard EC2 pricing for on-demand capacity.
  • 29. 24/7 clusters on Reserved Instances Minimize cost for consistent capacity. Reserved Instances for long running clusters. Up to 65% off on-demand pricing.
  • 30. Your data, your choice. Easy to integrate Elastic MapReduce with your datastores.
  • 31.
  • 32. Using Amazon S3 and On-Cluster Storage Data Sources Transient EMR cluster for batch map/reduce jobs for daily reports Long running EMR cluster holding data on the cluster in a NoSQL database Weekly Report Ad-hoc Query Data aggregated and stored in Amazon S3
  • 33. Use Amazon EMR with Redshift and S3 Data Sources Daily data aggregated in Amazon S3 Amazon EMR cluster used to process data Processed data loaded into Amazon Redshift data warehouse
  • 34. © 2014 MapR Technologies 34© 2014 MapR Technologies Introduction to MapR
  • 35. © 2014 MapR Technologies 35 MAPR: WORLDWIDEHADOOPTECHNOLOGYLEADER UNIQUELYADDRESSESBOTH ANALYTICANDOPERATIONALUSECASES 500+PAYINGCUSTOMERS HQ
  • 36. © 2014 MapR Technologies 36 Hadoop Distributions Open Source Open Source Distribution A Distribution B MANAGEMENT Open Source MANAGEMENT ARCHITECTURAL INNOVATIONS
  • 37. © 2014 MapR Technologies 37 Management MapR Data Platform APACHE HADOOP & OSS ECOSYSTEM Impala SharkHivePigHueOozieZooKeeper Mahout MLLibJujuSolrCascadingHttpFSFlume Storm Spark Streaming YARNMapReduceHBaseWhirrSqoop Drill Tez Knox Sentry Spark Falcon • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency MapR Distribution for Hadoop Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
  • 38. © 2014 MapR Technologies 38© 2014 MapR Technologies A winning combination: MapR with Amazon Elastic MapReduce
  • 39. © 2014 MapR Technologies 39 Launching a Cluster MapR Option Integrated within EMR
  • 40. © 2014 MapR Technologies 40 MapR: Designed for Both Transient and Long-Term Clusters • High Availability • Easy Development • Multi-Tenancy • World-Record Performance • Breadth of Applications Fastest On-Ramp to Develop Hadoop Applications Best Platform for Long-Term Hadoop Production Success
  • 41. © 2014 MapR Technologies 41 Resource Manager HA, Application Master HA JobTracker HA for MRv1 NFS HA Instant recovery • YARN jobs are not impacted by failures • Continue to meet SLAs with MapReduce v2 • MapReduce v1 jobs are not impacted by failures • Meet your data processing SLAs • High throughput and resilience for NFS-based data ingestion, import/export and multi-client access • Files and tables are accessible within seconds of a node failure or cluster restart High Availability (HA) For Hadoop No-NameNode architecture • Distributed metadata can self-heal • No practical limit on # of files
  • 42. © 2014 MapR Technologies 42 Direct Integration with Existing Applications • 100% POSIX compliant • Industry standard APIs - NFS, ODBC, LDAP, REST • More 3rd-party solutions • No proprietary connectors required • Language neutral
  • 43. © 2014 MapR Technologies 43 Multi-Tenancy Support for ParallelizedApp Development Isolation • Tasks sandboxed so they don’t impact other tasks or system daemons • System resources protected from runaway jobs • Volume-based data segregation based on users and groups • Volume-based data placement to control • Label-based job scheduling to control Quotas • Storage quotas by volume/user/group • CPU and memory quotas by queue/user/group Security and delegation • Fine-grained administration permissions including volume-level delegation • Authenticate users to AD, LDAP and Kerberos via Linux PAM Reporting • Detailed reporting on resource usage (75+ different metrics) • All reports are available via UI, CLI and REST API
  • 44. © 2014 MapR Technologies 44 MapR M7: The Best In-Hadoop NoSQL Database Benefit Features High Performance Over 1 million ops/sec with 10 node cluster Continuous Low Latency No I/O storms, no compactions 24x7 Applications Instant recovery, online schema modification, snapshots, mirroring Zero Administration No processes to manage, automated splits, self-tuning High Scalability 1 trillion tables, billions of rows, millions of columns Low TCO Files and tables on one platform, more work with fewer nodes Performance Reliability Easy Administration
  • 45. © 2014 MapR Technologies 45 425 925 333 563 367 532 163 331 IDH 2.4.1 CDH 4.3 Source: Flux7 Labs Study, October 2013 Flux7: Comparative Study of Hadoop Distributions Web Search and Data Analytics Benchmarks Page Rank Hive JOIN Query Timeinseconds Timein Seconds Lower is Better Hardware Specs: EC2 on AWS 1 Master: m1.xlarge; 64-bit; 4 vCPU, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel ® Xeon ® CPU E5-2650 0 @ 2.00 GHz 4 Slaves: m1.large; 64-bit; 2 vCPU, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel ® Xeon ® CPU E5430 @ 2.66 GHz
  • 46. © 2014 MapR Technologies 46 Comparative Study of Hadoop Distributions 212 59 262 69 276 64 475 465 IDH CDH HDP MapR Source: Flux7 Labs Study, October 2013 http://flux7.com/blogs/case-studies/hadoop-distributions-a-detailed-comparative-study-whitepaper/ Read and Write Throughput Benchmarks DFSIO Read Throughput DFSIO Write Throughput MBperSecond MBperSecond Hardware Specs: EC2 on AWS 1 Master: m1.xlarge; 64-bit; 4 vCPU, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel ® Xeon ® CPU E5-2650 0 @ 2.00 GHz 4 Slaves: m1.large; 64-bit; 2 vCPU, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel ® Xeon ® CPU E5430 @ 2.66 GHz Higher is Better
  • 47. © 2014 MapR Technologies 47 MapR M7: The Best In-Hadoop Database  NoSQL Columnar Store  Apache HBase API  Integrated with Hadoop HBase JVM HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics
  • 48. © 2014 MapR Technologies 48© 2014 MapR Technologies Customer Case Studies MapR with Amazon Elastic MapReduce in Action
  • 49. © 2014 MapR Technologies 49 Use cases for MapR with Amazon EMR • Targeted advertising / clickstream analysis • Security: anti-virus, fraud detection, image recognition • Pattern matching / recommendations • Reporting / BI • Bio-informatics (genome analysis) • Financial simulation (Monte Carlo simulation) • File processing (resize jpegs, video encoding) • Web indexing
  • 50. © 2014 MapR Technologies 50 Case Study Outcomes from MapR Deployment w/ EMR • Increased flexibility to scale at lower costs • Faster turnaround for customer requests • Ease of experimentation Challenges • RDBMS on AWS too slow • Solution must be compatible with AWS & Java 7 • High performance
  • 51. © 2014 MapR Technologies 51 Case Study Outcomes from MapR Deployment w/ EMR • Faster machine learning performance enables more/faster simulations • MapR M7 provides geospatial database backed by Amazon S3 Challenges • Large volumes of sensor data • Project weather for 2.5 years at every 20x20 plot across the US • Climatology simulations need to quickly experiment at small scale and then scale reliably
  • 52. © 2014 MapR Technologies 52© 2014 MapR Technologies Demo
  • 53. © 2014 MapR Technologies 53 MapR/EMR Demonstration • Create MapR cluster using EMR • Review MapR Control System (MCS) • Show S3 and MapR integration • Demonstrate MapR’s real-time capability • Connect Mac to MapR via NFS • Run queries with HiveServer2 and Impala • Visualize data with Tableau
  • 54. Questions and Contact MapR: http://aws.amazon.com/elasticmapreduce/mapr/ swooledge@mapr.com AWS Contact: aws.amazon.com/contact-us jonfritz@amazon.com @mapr @awscloud Maprtech Amazon Web Services
  • 55. We’d like your feedback. Please complete a short survey https://aws.asia.qualtrics.com/SE/?SID=SV_brzWly lHrqM29tr