SlideShare a Scribd company logo
1 of 38
1© Cloudera, Inc. All rights reserved.
Five Tips for Running Cloudera
on AWS
Joy Chatterjee | Senior Product Manager | Cloudera
Rahul Bhartia | Ecosystem Solutions Architect | Amazon Web Services
What We’re Going to Cover
• Hadoop in the cloud
• Architectural and access patterns
• Deployment and management
• Security and governance
• How to get started
3© Cloudera, Inc. All rights reserved.
Big data transforming business
CUSTOMER & CHANNEL DATA-DRIVEN PRODUCTS
SECURITY, RISK &
COMPLIANCE
4© Cloudera, Inc. All rights reserved.
Airbnb uses Cloudera
on AWS as a platform
for machine learning
and search that more
effectively matches
customers with the
right rental property
CUSTOMER 360
5© Cloudera, Inc. All rights reserved.
A camera company
uses Cloudera to run
big data and analytic
workloads on AWS to
literally soar ahead of
the competition
DATA-DRIVEN
PRODUCTS
6© Cloudera, Inc. All rights reserved.
FINRA uses Cloudera
on AWS to look at 30B
market events per
day to build a holistic
picture of US market
activity, while saving
$10-20M annually.
RISK
Hadoop in the AWS Cloud
Why AWS for Hadoop?
Immediate AvailabilityBroad & Deep
Capabilities
Scalable
Deploy the infrastructure you
need almost instantly without
long provisioning cycles.
Find everything you need to collect,
store, process, analyze and visualize
Big Data.
Scale from a few gigabytes to
several petabytes; and from a few
machines to thousands of nodes
with just a few clicks.
Global Footprint
Over 1 million active customers across 190 countries
1,700 government agencies
4,500 educational institutions
11 regions
30 availability zones
53 edge locations
Everyday, AWS adds enough new server capacity to support Amazon.com when it was a
$7 billion global enterprise.
Region
Edge Location
Administration
& Security
Access
Control
Identity
Management
Key Management
& Storage
Monitoring
& Logs
Resource &
Usage Auditing
Platform
Services
Analytics App Services Developer Tools & Operations Mobile Services
Data
Pipelines
Data
Warehouse
Hadoop
Real-time
Streaming Data
Application
Lifecycle
Management
Containers
Deployment
DevOps
Event-driven
Computing
Resource
Templates
Identity
Mobile
Analytics
Push
Notifications
Sync
App
Streaming
Email
Queuing &
Notifications
Search
Transcoding
Workflow
Core
Services
CDN
Compute
(VMs, Auto-scaling
& Load Balancing)
Databases
(Relational,
NoSQL, Caching)
Networking
(VPC, DX, DNS)
Storage
(Object, Block
and Archival)
Infrastructure
Availability
Zones
Points of
Presence
Regions
Enterprise
Applications
Business
Email
Sharing &
Collaboration
Virtual
Desktop
Technical &
Business Support
Account
Management
Partner
Ecosystem
Professional
Services
Security &
Pricing Reports
Solutions
Architects
Support
Training &
Certification
GPU
enabled
General
purpose
Memory
optimized
Storage
optimized
Compute
optimized
G2M4 R3C4
I
2
D2T2
Amazon
EFS
Amazon EBS
Amazon
EC2
Instance
storage
Amazon
S3
ObjectFile Block
Wide Choice of Compute & Storage
Amazon
Glacier
12© Cloudera, Inc. All rights reserved.
Hybrid
Deployment
Flexibility
Cloudera Enterprise on AWS
A new kind of data platform.
• One place for unlimited data
• Unified, multi-framework data access
Cloudera makes it:
• Fast for business
• Easy to manage
• Secure without compromise
13© Cloudera, Inc. All rights reserved.
On-Premises Cloud
Storage Direct Attached Direct Attached or Object Store
Data Not shared across clusters Shared across multiple clusters
Sizing Fixed-size Dynamic based on load
Usage Model All users share cluster Clusters created as needed for apps/users
Industry Standard Servers
(CPU, Memory, & Direct Attached Storage)
Industry Standard Servers
(CPU & Memory)
Hadoop is Different in the Cloud
Object
Storage
14© Cloudera, Inc. All rights reserved.
On-Premises vs. Cloud Deployment
Impala
Compute
bound
Memory
bound
Desires
balance of
compute and
memory
Impala
S3 Object Store
15© Cloudera, Inc. All rights reserved.
Common Architectural Patterns in AWS
S3 Object Store
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
• Short-running clusters
• Elastic workload
• Little local storage
• Long-running clusters
• Sized to demand
• Some local storage
BI/ANALYTICS
(Impala, Solr)
• Fixed clusters
• Periodic sync
• All local storage
APP DELIVERY
(HBase, Kudu)
16© Cloudera, Inc. All rights reserved.
Different Data Types and Access Patterns
Fast (< 2 sec) access to large volumes of data using Impala
Bulk processing of huge volumes of data using Spark or Hive
Real-time (< 10 ms) access to structured customer data using HBase
Integrated full-text search using Apache Solr
Real-time stream processing of data using Spark Streaming
Operational Data Search Stream Processing
Deployment and Management
18© Cloudera, Inc. All rights reserved.
Cloudera Director
Deploy and manage enterprise-grade Hadoop in the cloud
Trusted for Production
• Supports large scale customer
deployments
• Integrated part of Cloudera Enterprise
Flexible Deployment
• Out-of-the-box integrations with AWS
• Supports multiple cloud platforms
Simple Administration
• Dynamic cluster lifecycle management
• Multi-cluster, multi-environment
• Custom use case support through APIwww.cloudera.com/downloads
19© Cloudera, Inc. All rights reserved.
Example of Cloud Cluster
Master
Node
Master
Node
Master
Nodes Worker Group 1
Disk optimized instances
Worker Group 2
Memory optimized instances
• Optimize worker groups by using
various instances within the same
cluster
• Grow and shrink compute
independently across worker groups
based on load
• Use on-demand instances on masters
nodes and spot instances for worker
nodes
• Use Director to manage configuration
files for a blueprint for repeatable
deployments
20© Cloudera, Inc. All rights reserved.
Job dispatch to the cluster
Job
Cluster 1
Cluster 3
Cluster 2
Cluster 4
Object
storage
Dispatcher
Policy
Security and Governance
Familiar Security
Model
Validated and driven by
customers’ security experts
Benefits all customers
PEOPLE & PROCESS
SYSTEM
NETWORK
PHYSICAL
Security is Job Zero
AWS Foundation Services
Compute Storage Database Networking
AWS Global Infrastructure
Regions
Availability Zones
Edge Locations
Network
Security
Server
Security
Customer applications & content
You get to define
your controls IN
the Cloud
AWS takes care
of the security
OF the Cloud
You
AWS And You Share Responsibility for Security
Data
Security
Access
Control
Key AWS Certifications and Assurance Programs
Understand Configuration Changes
Automate IT asset inventory
Discover and provision cloud services
Audit and troubleshoot configuration changes in
the cloud
Full visibility of your AWS environment
• CloudTrail will record access to API calls and save logs in your S3
buckets, no matter how those API calls were made
Who did what and when and from where (IP address)
• CloudTrail support for many AWS services and growing - includes
EC2, EBS, VPC, RDS, IAM and RedShift
• Easily Aggregate all log information
Monitoring: Get consistent visibility of logs
27© Cloudera, Inc. All rights reserved.
Comprehensive, Compliance-Ready Security
Authentication, Authorization, Audit, and Compliance
Access
Defining what users
and applications can
do with data
Technical Concepts:
Permissions
Authorization
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
28© Cloudera, Inc. All rights reserved.
RecordService
Unified Access Control Enforcement
• New high performance security
layer that centrally enforces access
control policies across Hadoop
• Complements Apache Sentry’s unified
policy definition
• Row- and column-based security
• Dynamic data masking
• Apache-licensed open source
• Beta now available
FILESYSTEM
HDFS
NoSQL
HBASE
INGEST – SQOOP, FLUME, KAFKA
DATA INTEGRATION & STORAGE
SECURITY – SENTRY, RECORDSERVICE
RESOURCE MANAGEMENT – YARN
UNIFIED DATA SERVICES
BATCH STREAM SQL SEARCH MODEL ONLINE
DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS
SPARK,
HIVE, PIG
SPARK IMPALA SOLR SPARK HBASE
FILESYSTEM
S3
29© Cloudera, Inc. All rights reserved.
Fine-Grained HDFS Access without RecordService
Date/time Accnt # SSN Asset Trade Country
09:33:11 16-
Feb-2015
0234837823 238-23-
9876
AAPL Sell US
11:33:01 16-
Feb-2015
3947848494 329-44-
9847
TBT Buy EU
14:12:34 16-
Feb-2015
4848367383 123-56-
2345
IBM Sell UK
09:22:03 16-
Feb-2015
3485739384 585-11-
2345
INTC Buy US
11:55:33 16-
Feb-2015
3847598390 234-11-
8765
F Buy US
10:22:55 16-
Feb-2015
8765432176 344-22-
9876
UA Buy UK
13:45:24 16-
Feb-2015
3456789012 412-22-
8765
AMZN Sell EU
09:03:44 16-
Feb-2015
4857389329 123-44-
5678
TMV Buy US
15:55:55 16-
Feb-2015
4756983234 234-76-
9274
MA Buy UK
Date/time Accnt # SSN Asset Trade Country
14:12:34 16-
Feb-2015
4848367383 123-56-
2345
IBM Sell UK
10:22:55 16-
Feb-2015
8765432176 344-22-
9876
UA Buy UK
15:55:55 16-
Feb-2015
4756983234 234-76-
9274
MA Buy UK
Date/time Accnt # SSN Asset Trade Country
11:33:01 16-
Feb-2015
3947848494 329-44-
9847
TBT Buy EU
13:45:24 16-
Feb-2015
3456789012 412-22-
8765
AMZN Sell EU
Date/time Accnt # SSN Asset Trade Country
09:33:11 16-
Feb-2015
0234837823 238-23-
9876
AAPL Sell US
09:22:03 16-
Feb-2015
3485739384 585-11-
2345
INTC Buy US
11:55:33 16-
Feb-2015
3847598390 234-11-
8765
F Buy US
09:03:44 16-
Feb-2015
4857389329 123-44-
5678
TMV Buy US
Split the original file
Use HDFS permissions to limit access
30© Cloudera, Inc. All rights reserved.
Fine-Grained HDFS Access Control with RecordService
• Apply controls to the master data file
• Row, column, and sub-column (masking) controls
• Enforce these across all access paths
Date/time Accnt # SSN Asset Trade Country
09:33:11 16-
Feb-2015
0234837823 238-23-
9876
AAPL Sell US
11:33:01 16-
Feb-2015
3947848494 329-44-
9847
TBT Buy EU
14:12:34 16-
Feb-2015
4848367383 123-56-
2345
IBM Sell EU
09:22:03 16-
Feb-2015
3485739384 585-11-
2345
INTC Buy US
11:55:33 16-
Feb-2015
3847598390 234-11-
8765
F Buy US
10:22:55 16-
Feb-2015
8765432176 344-22-
9876
UA Buy EU
13:45:24 16-
Feb-2015
3456789012 412-22-
8765
AMZN Sell EU
Column-Level Controls
Row-LevelControls
Date/time Accnt # SSN Asset Trade Country
09:33:11 16-
Feb-2015
0234837823 238-23-
9876
AAPL Sell US
11:33:01 16-
Feb-2015
3947848494 329-44-
9847
TBT Buy group2
14:12:34 16-
Feb-2015
4848367383 123-56-
2345
IBM Sell group3
09:22:03 16-
Feb-2015
3485739384 585-11-
2345
INTC Buy US
11:55:33 16-
Feb-2015
3847598390 234-11-
8765
F Buy US
10:22:55 16-
Feb-2015
8765432176 344-22-
9876
UA Buy group3
13:45:24 16-
Feb-2015
3456789012 412-22-
8765
AMZN Sell group2
Column-Level Controls
Row-LevelControls
XXX-XX
XXX-XX
XXX-XX
What U.S. Brokers See
Data Management
32© Cloudera, Inc. All rights reserved.
Data Management Challenges
Business Users
• How do I find
what’s relevant?
• Can I trust what I
find?
• How can I explore
data on my own?
Compliance Officers
• Who’s accessing
what data?
• What are they
doing with the
data?
• Is sensitive data
governed and
protected?
• Can I meet
compliance needs?
Database Admins
• How is data being
used today?
• How can I optimize
for future
workloads?
• How can I take
advantage of
Hadoop risk-free
and fast?
Data
Stewards/Curators
• How can I manage
data from ingest to
purge?
• How do I classify
data efficiently?
• How can data be
made available to
end-users?
33© Cloudera, Inc. All rights reserved.
Cloudera Navigator in the Cloud
Data Management & Governance
Audit
Only distribution to pass PCI audit
Lineage
How did I get to this truth?
Search
Metadata – What are all the files
associated with this product?
Policy
Move data from HDFS to Amazon S3
How to Get Started
Cloudera on AWS Checklist
1. Use the right AWS instance types for the right Hadoop workload
2. Take a Dev/Ops approach to managing the Hadoop lifecycle on AWS
3. Amazon S3 provides good, low-cost cloud storage for Hadoop jobs
4. Security is everyone’s responsibility. Enforce multi-layer security that covers cloud,
cluster, and data access, authentication, and encryption
5. Metadata is key to data management and governance. Without it, your users can’t
answer the questions that matter to your business
36© Cloudera, Inc. All rights reserved.
Cloudera Live runs on
AWS and includes
step-by-step tutorials
and integrations with
familiar BI tools
www.cloudera.com/live
AWS Quick Start Reference Deployments
http://aws.amazon.com/quickstart
38© Cloudera, Inc. All rights reserved.
Thank You!

More Related Content

What's hot

Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth newsnnakasone
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosCloudera, Inc.
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Jay Patel
 
Gartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingGartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingYerlin Sturdivant
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certificationCloudera, Inc.
 
Microsoft azure infrastructure essentials course manual
Microsoft azure infrastructure essentials   course manualMicrosoft azure infrastructure essentials   course manual
Microsoft azure infrastructure essentials course manualmichaeldejene4
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNicholas Vossburg
 

What's hot (20)

Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth news
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Gartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingGartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networking
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certification
 
Microsoft azure infrastructure essentials course manual
Microsoft azure infrastructure essentials   course manualMicrosoft azure infrastructure essentials   course manual
Microsoft azure infrastructure essentials course manual
 
NoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch DeckNoSQL Migration Technical Pitch Deck
NoSQL Migration Technical Pitch Deck
 

Viewers also liked

Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaCloudera, Inc.
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
The introduction of nexaweb flatform v4
The introduction of nexaweb flatform v4The introduction of nexaweb flatform v4
The introduction of nexaweb flatform v4Duc Nguyen
 
Internet Protocol Based Closed Circuit Video Monitoring System
Internet Protocol Based Closed Circuit Video Monitoring SystemInternet Protocol Based Closed Circuit Video Monitoring System
Internet Protocol Based Closed Circuit Video Monitoring Systemharshivaishu
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera, Inc.
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceJUNWEI GUAN
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpaceAndrei Savu
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search TrainingCloudera, Inc.
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 

Viewers also liked (20)

Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
The introduction of nexaweb flatform v4
The introduction of nexaweb flatform v4The introduction of nexaweb flatform v4
The introduction of nexaweb flatform v4
 
Internet Protocol Based Closed Circuit Video Monitoring System
Internet Protocol Based Closed Circuit Video Monitoring SystemInternet Protocol Based Closed Circuit Video Monitoring System
Internet Protocol Based Closed Circuit Video Monitoring System
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business Intelligence
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpace
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
LKV Live Chat
LKV Live ChatLKV Live Chat
LKV Live Chat
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 

Similar to Five Tips for Running Cloudera on AWS

Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Cloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusCloudera, Inc.
 

Similar to Five Tips for Running Cloudera on AWS (20)

Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with Altus
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Five Tips for Running Cloudera on AWS

  • 1. 1© Cloudera, Inc. All rights reserved. Five Tips for Running Cloudera on AWS Joy Chatterjee | Senior Product Manager | Cloudera Rahul Bhartia | Ecosystem Solutions Architect | Amazon Web Services
  • 2. What We’re Going to Cover • Hadoop in the cloud • Architectural and access patterns • Deployment and management • Security and governance • How to get started
  • 3. 3© Cloudera, Inc. All rights reserved. Big data transforming business CUSTOMER & CHANNEL DATA-DRIVEN PRODUCTS SECURITY, RISK & COMPLIANCE
  • 4. 4© Cloudera, Inc. All rights reserved. Airbnb uses Cloudera on AWS as a platform for machine learning and search that more effectively matches customers with the right rental property CUSTOMER 360
  • 5. 5© Cloudera, Inc. All rights reserved. A camera company uses Cloudera to run big data and analytic workloads on AWS to literally soar ahead of the competition DATA-DRIVEN PRODUCTS
  • 6. 6© Cloudera, Inc. All rights reserved. FINRA uses Cloudera on AWS to look at 30B market events per day to build a holistic picture of US market activity, while saving $10-20M annually. RISK
  • 7. Hadoop in the AWS Cloud
  • 8. Why AWS for Hadoop? Immediate AvailabilityBroad & Deep Capabilities Scalable Deploy the infrastructure you need almost instantly without long provisioning cycles. Find everything you need to collect, store, process, analyze and visualize Big Data. Scale from a few gigabytes to several petabytes; and from a few machines to thousands of nodes with just a few clicks.
  • 9. Global Footprint Over 1 million active customers across 190 countries 1,700 government agencies 4,500 educational institutions 11 regions 30 availability zones 53 edge locations Everyday, AWS adds enough new server capacity to support Amazon.com when it was a $7 billion global enterprise. Region Edge Location
  • 10. Administration & Security Access Control Identity Management Key Management & Storage Monitoring & Logs Resource & Usage Auditing Platform Services Analytics App Services Developer Tools & Operations Mobile Services Data Pipelines Data Warehouse Hadoop Real-time Streaming Data Application Lifecycle Management Containers Deployment DevOps Event-driven Computing Resource Templates Identity Mobile Analytics Push Notifications Sync App Streaming Email Queuing & Notifications Search Transcoding Workflow Core Services CDN Compute (VMs, Auto-scaling & Load Balancing) Databases (Relational, NoSQL, Caching) Networking (VPC, DX, DNS) Storage (Object, Block and Archival) Infrastructure Availability Zones Points of Presence Regions Enterprise Applications Business Email Sharing & Collaboration Virtual Desktop Technical & Business Support Account Management Partner Ecosystem Professional Services Security & Pricing Reports Solutions Architects Support Training & Certification
  • 12. 12© Cloudera, Inc. All rights reserved. Hybrid Deployment Flexibility Cloudera Enterprise on AWS A new kind of data platform. • One place for unlimited data • Unified, multi-framework data access Cloudera makes it: • Fast for business • Easy to manage • Secure without compromise
  • 13. 13© Cloudera, Inc. All rights reserved. On-Premises Cloud Storage Direct Attached Direct Attached or Object Store Data Not shared across clusters Shared across multiple clusters Sizing Fixed-size Dynamic based on load Usage Model All users share cluster Clusters created as needed for apps/users Industry Standard Servers (CPU, Memory, & Direct Attached Storage) Industry Standard Servers (CPU & Memory) Hadoop is Different in the Cloud Object Storage
  • 14. 14© Cloudera, Inc. All rights reserved. On-Premises vs. Cloud Deployment Impala Compute bound Memory bound Desires balance of compute and memory Impala S3 Object Store
  • 15. 15© Cloudera, Inc. All rights reserved. Common Architectural Patterns in AWS S3 Object Store Source Data Seed Data Backup/DR ETL/MODELING (Spark, MapReduce) • Short-running clusters • Elastic workload • Little local storage • Long-running clusters • Sized to demand • Some local storage BI/ANALYTICS (Impala, Solr) • Fixed clusters • Periodic sync • All local storage APP DELIVERY (HBase, Kudu)
  • 16. 16© Cloudera, Inc. All rights reserved. Different Data Types and Access Patterns Fast (< 2 sec) access to large volumes of data using Impala Bulk processing of huge volumes of data using Spark or Hive Real-time (< 10 ms) access to structured customer data using HBase Integrated full-text search using Apache Solr Real-time stream processing of data using Spark Streaming Operational Data Search Stream Processing
  • 18. 18© Cloudera, Inc. All rights reserved. Cloudera Director Deploy and manage enterprise-grade Hadoop in the cloud Trusted for Production • Supports large scale customer deployments • Integrated part of Cloudera Enterprise Flexible Deployment • Out-of-the-box integrations with AWS • Supports multiple cloud platforms Simple Administration • Dynamic cluster lifecycle management • Multi-cluster, multi-environment • Custom use case support through APIwww.cloudera.com/downloads
  • 19. 19© Cloudera, Inc. All rights reserved. Example of Cloud Cluster Master Node Master Node Master Nodes Worker Group 1 Disk optimized instances Worker Group 2 Memory optimized instances • Optimize worker groups by using various instances within the same cluster • Grow and shrink compute independently across worker groups based on load • Use on-demand instances on masters nodes and spot instances for worker nodes • Use Director to manage configuration files for a blueprint for repeatable deployments
  • 20. 20© Cloudera, Inc. All rights reserved. Job dispatch to the cluster Job Cluster 1 Cluster 3 Cluster 2 Cluster 4 Object storage Dispatcher Policy
  • 22. Familiar Security Model Validated and driven by customers’ security experts Benefits all customers PEOPLE & PROCESS SYSTEM NETWORK PHYSICAL Security is Job Zero
  • 23. AWS Foundation Services Compute Storage Database Networking AWS Global Infrastructure Regions Availability Zones Edge Locations Network Security Server Security Customer applications & content You get to define your controls IN the Cloud AWS takes care of the security OF the Cloud You AWS And You Share Responsibility for Security Data Security Access Control
  • 24. Key AWS Certifications and Assurance Programs
  • 25. Understand Configuration Changes Automate IT asset inventory Discover and provision cloud services Audit and troubleshoot configuration changes in the cloud
  • 26. Full visibility of your AWS environment • CloudTrail will record access to API calls and save logs in your S3 buckets, no matter how those API calls were made Who did what and when and from where (IP address) • CloudTrail support for many AWS services and growing - includes EC2, EBS, VPC, RDS, IAM and RedShift • Easily Aggregate all log information Monitoring: Get consistent visibility of logs
  • 27. 27© Cloudera, Inc. All rights reserved. Comprehensive, Compliance-Ready Security Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
  • 28. 28© Cloudera, Inc. All rights reserved. RecordService Unified Access Control Enforcement • New high performance security layer that centrally enforces access control policies across Hadoop • Complements Apache Sentry’s unified policy definition • Row- and column-based security • Dynamic data masking • Apache-licensed open source • Beta now available FILESYSTEM HDFS NoSQL HBASE INGEST – SQOOP, FLUME, KAFKA DATA INTEGRATION & STORAGE SECURITY – SENTRY, RECORDSERVICE RESOURCE MANAGEMENT – YARN UNIFIED DATA SERVICES BATCH STREAM SQL SEARCH MODEL ONLINE DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS SPARK, HIVE, PIG SPARK IMPALA SOLR SPARK HBASE FILESYSTEM S3
  • 29. 29© Cloudera, Inc. All rights reserved. Fine-Grained HDFS Access without RecordService Date/time Accnt # SSN Asset Trade Country 09:33:11 16- Feb-2015 0234837823 238-23- 9876 AAPL Sell US 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy EU 14:12:34 16- Feb-2015 4848367383 123-56- 2345 IBM Sell UK 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy US 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy US 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy UK 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell EU 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy US 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy UK Date/time Accnt # SSN Asset Trade Country 14:12:34 16- Feb-2015 4848367383 123-56- 2345 IBM Sell UK 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy UK 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy UK Date/time Accnt # SSN Asset Trade Country 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy EU 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell EU Date/time Accnt # SSN Asset Trade Country 09:33:11 16- Feb-2015 0234837823 238-23- 9876 AAPL Sell US 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy US 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy US 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy US Split the original file Use HDFS permissions to limit access
  • 30. 30© Cloudera, Inc. All rights reserved. Fine-Grained HDFS Access Control with RecordService • Apply controls to the master data file • Row, column, and sub-column (masking) controls • Enforce these across all access paths Date/time Accnt # SSN Asset Trade Country 09:33:11 16- Feb-2015 0234837823 238-23- 9876 AAPL Sell US 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy EU 14:12:34 16- Feb-2015 4848367383 123-56- 2345 IBM Sell EU 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy US 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy US 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy EU 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell EU Column-Level Controls Row-LevelControls Date/time Accnt # SSN Asset Trade Country 09:33:11 16- Feb-2015 0234837823 238-23- 9876 AAPL Sell US 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 14:12:34 16- Feb-2015 4848367383 123-56- 2345 IBM Sell group3 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy US 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy US 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell group2 Column-Level Controls Row-LevelControls XXX-XX XXX-XX XXX-XX What U.S. Brokers See
  • 32. 32© Cloudera, Inc. All rights reserved. Data Management Challenges Business Users • How do I find what’s relevant? • Can I trust what I find? • How can I explore data on my own? Compliance Officers • Who’s accessing what data? • What are they doing with the data? • Is sensitive data governed and protected? • Can I meet compliance needs? Database Admins • How is data being used today? • How can I optimize for future workloads? • How can I take advantage of Hadoop risk-free and fast? Data Stewards/Curators • How can I manage data from ingest to purge? • How do I classify data efficiently? • How can data be made available to end-users?
  • 33. 33© Cloudera, Inc. All rights reserved. Cloudera Navigator in the Cloud Data Management & Governance Audit Only distribution to pass PCI audit Lineage How did I get to this truth? Search Metadata – What are all the files associated with this product? Policy Move data from HDFS to Amazon S3
  • 34. How to Get Started
  • 35. Cloudera on AWS Checklist 1. Use the right AWS instance types for the right Hadoop workload 2. Take a Dev/Ops approach to managing the Hadoop lifecycle on AWS 3. Amazon S3 provides good, low-cost cloud storage for Hadoop jobs 4. Security is everyone’s responsibility. Enforce multi-layer security that covers cloud, cluster, and data access, authentication, and encryption 5. Metadata is key to data management and governance. Without it, your users can’t answer the questions that matter to your business
  • 36. 36© Cloudera, Inc. All rights reserved. Cloudera Live runs on AWS and includes step-by-step tutorials and integrations with familiar BI tools www.cloudera.com/live
  • 37. AWS Quick Start Reference Deployments http://aws.amazon.com/quickstart
  • 38. 38© Cloudera, Inc. All rights reserved. Thank You!