SlideShare a Scribd company logo
1 of 43
Download to read offline
www.reancloud.com
USA: Herndon | Philadelphia | Los Angeles
India: Hyderabad | Pune | Udaipur | Bangalore
Big Data on AWS
How to implement and automate secure Big
Data architectures & solutions on AWS
1
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Workshop Agenda
Introduction to REAN Cloud
What is Big Data
Why AWS is the most complete Big Data platform
Overview of selected AWS Big Data Products
Why, What, How { Innovation, Best Practices, and Security }
Equivalent Open Source Software ( where applicable )
REAN Data Lake Reference Architecture
Demo & Discussion
Live demo / hands-on of using EMR, Athena
2
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Organizational Profile
Established: 2013
Presence:
USA, India
Number of Employees: 300+
AWS Certifications: 150+ (Including 15+ Professional
Certifications)
Industry Focus:
Education, Government, Healthcare / Life Sciences,
Financial Services, and ISV
Management team consisting of executives formerly
from Fortune 500 Enterprises (AWS, Amdocs, BMC,
Merck and Cognizant) and with a deep AWS Cloud
Computing experience.
Recognized by TechTarget as the top AWS Partner
providing innovative DevSecOps services.
24x7 follow the sun model with offices around the
world and continuous operations in multiple time
zones - EST, PST and IST.
Recognized in Gartner’s March 2017 Magic Quadrant
for Public Cloud Infrastructure Managed Service
Providers - Leading the Niche Players in completeness
of Vision & Ability to Execute.
3
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
AWS Competencies
4
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Customers by Vertical
Government / Public
Sector ISVEducation
Financial
Services
Healthcare / Life
Sciences
5
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is Big Data?
6
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
VELOCITY
VARIETY
VOLUME
Big Data Characteristics:
Ever Expanding Horizon
Source: www.datasciencecentral.com
7
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
AWS Big Data technologies usually falls into one of these two categories
Lightswitch Cockpit
Or
8
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
AWS - The Most Complete Platform for Big Data - 1/2
Analytic Frameworks Real Time Analytics Storage & Databases Data Warehousing Business Intelligence
Amazon EMR
Amazon Athena Kinesis Firehouse
Kinesis Streams
Kinesis Analytics Amazon Athena
DynamoDB
Redshift QuickSight
Elasticsearch Service
S3
HBase on EMR, RDS & Aurora
9
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
AWS - The Most Complete Platform for Big Data - 2/2
Artificial Intelligence
Lex, Polly, Rekognition, ML, MXnet
Internet of Things
IoT, Greengrass
Serverless Compute
Lambda
Amazon EC2 Instances
Optimised { Compute, Memory,
Storage, GPU }
Data Movement
Direct connect, Snowball, Database
Migration Service, Storage gateway
10
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Big Data Analytic Frameworks
Amazon EMR Amazon Athena
11
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why EMR ? - 1 / 4
How to deal with Big Data ?
Massively Parallel Processing (MPP) frameworks !
Hadoop = MapReduce + HDFS
MapReduce is a popular distributed processing framework
HDFS provides scalable and reliable distributed data storage
Running a Hadoop cluster is challenging
Scale data nodes as data size increases
Scale out/in compute nodes to tackle spiky traffic
Managing resources (nodes, disks) is time-consuming
Enterprise Hadoop distributions fill the need…
Very expensive ($$$)
12
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is EMR ? - 2 / 4
An Enterprise-grade Hadoop distribution from Amazon with no Licensing fee
Compute Storage & Database Data Import/Export
Orchestration Configuration
Web UI
Machine Learning Monitoring
13
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
EMR Innovations - 3 / 4
Easy cluster resizing - add/remove task/core nodes
Auto scaling - scale out/in based on workload
Use spot instances for handling spiky traffic
Spin-up transient / long-running clusters
for pre-defined workflows and ad hoc tasks
Highly Availability
Monitors nodes and replaces unhealthy nodes.
Cost Savings
EMR = $304.63
Hadoop = $338.40
(5x m4.10xlarge instances / day)
Decouple storage and compute
No contention of system resources (CPU and RAM)
EMR Filesystem - EMRFS
Mount S3 as a HDFS endpoint
S3 offers virtually unlimited and inexpensive storage
No need to Hadoop add data nodes in EMR clusters
Data Transfer from S3 to EMR within same region is free
14
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
EMR Security - 4 / 4
Isolation
Logical isolation using VPC
Security Groups ( firewall rules )
Master/Slave security groups
Encryption at-rest
S3 server/client side encryption with AWS KMS /
Custom keys
Authentication, Authorization, Accounting
Use IAM roles
Audit API calls using CloudTrail
Use custom EMR AMI to encrypt boot volumes (KMS)
15
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why Athena ? - 1 / 4
Analyze data in all shapes and sizes with minimal effort
Prototype a solution before full-blown implementation
Building Data pipeline is complex…
Requires variety of specialized skillset
ETL is time-consuming…
Especially if you just want to quickly explore data
Data Scientists just want to get going asap!
Impeded by the need to perform ETL, build data pipelines
Data Analysts want a self-service solution..
without relying on Data Engineers for everything
16
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is Athena ? - 2 / 4
Athena is an interactive query service
Analyze data in Amazon S3 using Standard ANSI SQL
Athena is Serverless
• No infrastructure to maintain; zero spin-up time.
• Uses warm compute pools across multiple AZ
Built using well-known open source technologies
• Presto SQL query engine
• Hive for DDL functionality
17
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Athena Innovations - 3 / 4
No loading of data
• Stream data directly from S3
• No ETL required
Query data in its raw format
• Text, CSV, JSON, Web logs, AWS service logs
• Convert to ORC/Parquet for best performance / low cost
Pay per query
• Automatically parallelizes queries
Results also stored in S3
18
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Athena Security - 4 / 4
Access
Use IAM policies to manage service access
Use S3 permissions to manage r/w data access
Encryption at-rest
Data in S3 can be encrypted (SSE-S3, SSE-KMS, CSE-KMS)
Query results can be encrypted when stored back in S3
Encryption in-transit
Uses TLS encryption for accessing data stored in S3
Uses SSL encryption for JDBC clients
19
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Real Time Big Data Analytics
Kinesis FirehouseKinesis Streams Kinesis Analytics
20
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why Kinesis ? - 1 / 5
Scalable Service bus to connect data producers and consumers
IoT, Mobile / Web clients
Building an enterprise-grade queuing system is challenging…
Challenges
Scale - as # or size of input data increases
Reliability - store data until consumers downloads/processes
Push / Pull support
...
21
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Introduction to Kinesis Family - 2 / 5
Amazon Kinesis
Streams
Analyze streaming data
Firehose
Prepare and load streaming data into
AWS
Analytics
Analyze streaming data with
standard SQL
Fully managed service
Easily collect, process, and analyze real-time, streaming data.
Equivalent OSS - Apache Kafka
22
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Kinesis Streams - 3 / 5
Connects real-time data producers and consumers
Put data using the Kinesis Producer Library (KPL)
Get data using the Kinesis Client Library (KCL)
Data-consumer apps pulls data from streams
Streams are made of Shards
add/remove shards to adjust capacity
Encrypt sensitive data using SSE and AWS KMS
23
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Kinesis Firehose - 4 / 5
Capture, transform, and push streaming real-time data into...
Kinesis Analytics, S3, Redshift, Elasticsearch Service
Loads new data within 60 seconds after being received
Auto-scaling (stream management, sharding) and monitoring
Batch, compress, and encrypt the data before loading it
Add data conversion on-the-fly
without a data processing pipeline
24
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Kinesis Analytics - 5 / 5
Process streaming data in real time with standard SQL
Sub 1-second processing latencies
In: Kinesis streams or Firehose
Automatically recognizes standard data formats; suggests schema
Manually update schema or provide new one for unstructured data
Out: Kinesis Streams and Firehose
S3, Redshift, Elasticsearch Service, or custom destination.
Write your queries using standard SQL
25
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Big Data Storage & Databases
DynamoDB Aurora
26
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why DynamoDB ? - 1 / 2
More entities producing Non-Structured data
Need a scalable & reliable solution to capture, process
Building a Scalable, Reliable, Resilient NoSQL DB Infrastructure is an engineering challenge !
Some challenges...
Infrastructure layer - replicate, scale, restore
Database layer - synchronization, read-after-write
Network layer - load balancer
Security - restricted protected access
….
27
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is DynamoDB - 2 / 2
Fast and flexible NoSQL database service
Single-digit millisecond latency at any scale
For microsecond response, use DynamoDB Accelerator (DAX)
Supports both document and key-value store models
Similar to MongoDB from this perspective
Integrates with IAM, AWS Lambda for triggers
28
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why Aurora ? - 1 / 4
Open-source RDBMS are free-to-use but…
Do not meet commercial RDBMS performance
Commercial RDBMS provide performance but…
Are too expensive for many organizations
Is there a best of both worlds ?
29
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is Aurora ? - 2 / 4
Is a managed database service
Similar to RDS
Is a new relational database engine
Compatible with MySQL 5.6, PostgreSQL 9.6
Existing ecosystem will work with little or no change
Throughput
5x MySQL, 2x PostgreSQL
On par with commercial databases
Example: 500K reads, 100K writes
1/10th the cost of commercial databases
30
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Aurora Innovations - 3 / 4
High Availability and Durability
> 99.99% availability
Data is replicated 6 times across 3 AZs
Continuously backed to S3.
Failure recovery is transparent / automated
Database Migration
Use standard MySQL import and export tools
Use Database Migration Service (DMS) to migrate to Aurora
31
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Aurora Security - 4 / 4
Isolation
Network isolation using VPC
Encryption-at-rest
Storage is encrypted
Includes backups, snapshots and replicas
AWS KMS is transparently used
Encryption in-motion
Using SSL
32
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Data Warehousing
Redshift
33
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Why Redshift ? - 1 / 5
Why do you need a data warehouse ?
OLTP systems are good for transactions
• Not a good fit for analytics-type workloads
A centralized location to capture data from OLTP
• Used for analytics, trend analytics
Challenges
Performance
Cost
Scalability
…
34
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
How AWS built Redshift ? - 2 / 5
35
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
What is Redshift ? - 3 / 5
Distributed computing architecture
Each node is independent, self-sufficient
No single-point of failure
Nodes don’t share memory, disk storage (SN architecture)
Leader node
Endpoint: SQL, JDBC/ODBC, BI tools
Stores metadata, Coordinates parallel SQL processing
Compute nodes
Local, columnar storage
Ingestion, Backup, Restore to…
S3 / EMR / DynamoDB / SSH
36
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Redshift Innovations - 4 / 5
1-click deployment to launch
on multiple regions around the world
Pay-as-you-go pricing
$1000 / TB / Year
Large collection of pre-installed software, libraries
Pandas, NumPy and SciPy
Write new User Defined Functions
using Python 2.7
37
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Redshift Security - 5 / 5
IAM roles
Role-based Access Control
38
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
REAN Cloud Data Lake
Quick start / reference architecture
39
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
REAN Cloud gains AWS Big Data Competency
40
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Data Lake Reference architecture
41
@2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com
Demo, Workshop
Demo
• Live walkthrough of launching EMR, using Athena
Workshop
• Using Athena to analyze a open dataset
42
Thank You!
REAN Cloud LLC
2201 Cooperative Way, Suite 302
Herndon, VA 20171
+1(844) 377- (7326)
Do you have any questions?
info@reancloud.com
www.reancloud.com
Larry Bradley
S. P. T. Krishnan
Steve Toback
Steve Vaughan

More Related Content

What's hot

Modernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSModernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSAmazon Web Services
 
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...Amazon Web Services
 
Distributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a TransactionDistributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a TransactionAmazon Web Services
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudAmazon Web Services
 
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormation
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormationDevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormation
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormationAmazon Web Services
 
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...Amazon Web Services
 
AWS Black Belt Tips for IT Operations
AWS Black Belt Tips for IT OperationsAWS Black Belt Tips for IT Operations
AWS Black Belt Tips for IT OperationsAmazon Web Services
 
Lighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkLighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkAmazon Web Services
 
What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases: What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases: Amazon Web Services
 
DevOps for Serverless Computing with Demo
DevOps for Serverless Computing with DemoDevOps for Serverless Computing with Demo
DevOps for Serverless Computing with DemoAmazon Web Services
 
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016Amazon Web Services
 
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...Amazon Web Services
 
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services
 
Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT
 Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT
Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPTAmazon Web Services
 
Amazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on WorkshopAmazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on WorkshopAmazon Web Services
 
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh Albuquerque
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh AlbuquerqueAWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh Albuquerque
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh AlbuquerqueAmazon Web Services Korea
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAmazon Web Services
 
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...Amazon Web Services
 
Microsoft on AWS - AWS Summit SG 2017
Microsoft on AWS - AWS Summit SG 2017Microsoft on AWS - AWS Summit SG 2017
Microsoft on AWS - AWS Summit SG 2017Amazon Web Services
 

What's hot (20)

Modernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWSModernize and Move your Microsoft Applications on AWS
Modernize and Move your Microsoft Applications on AWS
 
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
 
Distributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a TransactionDistributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a Transaction
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS Cloud
 
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormation
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormationDevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormation
DevOps on AWS: Deep Dive on AWS Code Services and AWS CloudFormation
 
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...
A Well Architected SaaS - A Holistic Look at Cloud Architecture - Pop-up Loft...
 
AWS Black Belt Tips for IT Operations
AWS Black Belt Tips for IT OperationsAWS Black Belt Tips for IT Operations
AWS Black Belt Tips for IT Operations
 
Lighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkLighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache Spark
 
What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases: What’s New in Amazon RDS for Open-Source and Commercial Databases:
What’s New in Amazon RDS for Open-Source and Commercial Databases:
 
DevOps for Serverless Computing with Demo
DevOps for Serverless Computing with DemoDevOps for Serverless Computing with Demo
DevOps for Serverless Computing with Demo
 
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016
DevOps Tech Talk: Getting out of Operations Hell | AWS Public Sector Summit 2016
 
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...
AWS January 2016 Webinar Series - Cloud Data Migration: 6 Strategies for Gett...
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
 
Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT
 Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT
Datavail Accelerates AWS Adoption for Sony DADC New Media Solutions PPT
 
Amazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on WorkshopAmazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on Workshop
 
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh Albuquerque
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh AlbuquerqueAWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh Albuquerque
AWS Innovate: Running SAP Solutions on AWS Cloud- Shailesh Albuquerque
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled Apps
 
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...
Is AWS GovCloud (US) Right for Your Regulated Workload? | AWS Public Sector S...
 
Microsoft on AWS - AWS Summit SG 2017
Microsoft on AWS - AWS Summit SG 2017Microsoft on AWS - AWS Summit SG 2017
Microsoft on AWS - AWS Summit SG 2017
 

Similar to Implement Big Data on AWS

Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugaljavier ramirez
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Amazon Web Services
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakejavier ramirez
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 

Similar to Implement Big Data on AWS (20)

Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Cloud storage
Cloud storageCloud storage
Cloud storage
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
 
APN Live-AWS Core Services
APN Live-AWS Core ServicesAPN Live-AWS Core Services
APN Live-AWS Core Services
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Implement Big Data on AWS

  • 1. www.reancloud.com USA: Herndon | Philadelphia | Los Angeles India: Hyderabad | Pune | Udaipur | Bangalore Big Data on AWS How to implement and automate secure Big Data architectures & solutions on AWS 1
  • 2. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Workshop Agenda Introduction to REAN Cloud What is Big Data Why AWS is the most complete Big Data platform Overview of selected AWS Big Data Products Why, What, How { Innovation, Best Practices, and Security } Equivalent Open Source Software ( where applicable ) REAN Data Lake Reference Architecture Demo & Discussion Live demo / hands-on of using EMR, Athena 2
  • 3. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Organizational Profile Established: 2013 Presence: USA, India Number of Employees: 300+ AWS Certifications: 150+ (Including 15+ Professional Certifications) Industry Focus: Education, Government, Healthcare / Life Sciences, Financial Services, and ISV Management team consisting of executives formerly from Fortune 500 Enterprises (AWS, Amdocs, BMC, Merck and Cognizant) and with a deep AWS Cloud Computing experience. Recognized by TechTarget as the top AWS Partner providing innovative DevSecOps services. 24x7 follow the sun model with offices around the world and continuous operations in multiple time zones - EST, PST and IST. Recognized in Gartner’s March 2017 Magic Quadrant for Public Cloud Infrastructure Managed Service Providers - Leading the Niche Players in completeness of Vision & Ability to Execute. 3
  • 4. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com AWS Competencies 4
  • 5. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Customers by Vertical Government / Public Sector ISVEducation Financial Services Healthcare / Life Sciences 5
  • 6. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is Big Data? 6
  • 7. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com VELOCITY VARIETY VOLUME Big Data Characteristics: Ever Expanding Horizon Source: www.datasciencecentral.com 7
  • 8. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com AWS Big Data technologies usually falls into one of these two categories Lightswitch Cockpit Or 8
  • 9. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com AWS - The Most Complete Platform for Big Data - 1/2 Analytic Frameworks Real Time Analytics Storage & Databases Data Warehousing Business Intelligence Amazon EMR Amazon Athena Kinesis Firehouse Kinesis Streams Kinesis Analytics Amazon Athena DynamoDB Redshift QuickSight Elasticsearch Service S3 HBase on EMR, RDS & Aurora 9
  • 10. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com AWS - The Most Complete Platform for Big Data - 2/2 Artificial Intelligence Lex, Polly, Rekognition, ML, MXnet Internet of Things IoT, Greengrass Serverless Compute Lambda Amazon EC2 Instances Optimised { Compute, Memory, Storage, GPU } Data Movement Direct connect, Snowball, Database Migration Service, Storage gateway 10
  • 11. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Big Data Analytic Frameworks Amazon EMR Amazon Athena 11
  • 12. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why EMR ? - 1 / 4 How to deal with Big Data ? Massively Parallel Processing (MPP) frameworks ! Hadoop = MapReduce + HDFS MapReduce is a popular distributed processing framework HDFS provides scalable and reliable distributed data storage Running a Hadoop cluster is challenging Scale data nodes as data size increases Scale out/in compute nodes to tackle spiky traffic Managing resources (nodes, disks) is time-consuming Enterprise Hadoop distributions fill the need… Very expensive ($$$) 12
  • 13. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is EMR ? - 2 / 4 An Enterprise-grade Hadoop distribution from Amazon with no Licensing fee Compute Storage & Database Data Import/Export Orchestration Configuration Web UI Machine Learning Monitoring 13
  • 14. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com EMR Innovations - 3 / 4 Easy cluster resizing - add/remove task/core nodes Auto scaling - scale out/in based on workload Use spot instances for handling spiky traffic Spin-up transient / long-running clusters for pre-defined workflows and ad hoc tasks Highly Availability Monitors nodes and replaces unhealthy nodes. Cost Savings EMR = $304.63 Hadoop = $338.40 (5x m4.10xlarge instances / day) Decouple storage and compute No contention of system resources (CPU and RAM) EMR Filesystem - EMRFS Mount S3 as a HDFS endpoint S3 offers virtually unlimited and inexpensive storage No need to Hadoop add data nodes in EMR clusters Data Transfer from S3 to EMR within same region is free 14
  • 15. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com EMR Security - 4 / 4 Isolation Logical isolation using VPC Security Groups ( firewall rules ) Master/Slave security groups Encryption at-rest S3 server/client side encryption with AWS KMS / Custom keys Authentication, Authorization, Accounting Use IAM roles Audit API calls using CloudTrail Use custom EMR AMI to encrypt boot volumes (KMS) 15
  • 16. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why Athena ? - 1 / 4 Analyze data in all shapes and sizes with minimal effort Prototype a solution before full-blown implementation Building Data pipeline is complex… Requires variety of specialized skillset ETL is time-consuming… Especially if you just want to quickly explore data Data Scientists just want to get going asap! Impeded by the need to perform ETL, build data pipelines Data Analysts want a self-service solution.. without relying on Data Engineers for everything 16
  • 17. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is Athena ? - 2 / 4 Athena is an interactive query service Analyze data in Amazon S3 using Standard ANSI SQL Athena is Serverless • No infrastructure to maintain; zero spin-up time. • Uses warm compute pools across multiple AZ Built using well-known open source technologies • Presto SQL query engine • Hive for DDL functionality 17
  • 18. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Athena Innovations - 3 / 4 No loading of data • Stream data directly from S3 • No ETL required Query data in its raw format • Text, CSV, JSON, Web logs, AWS service logs • Convert to ORC/Parquet for best performance / low cost Pay per query • Automatically parallelizes queries Results also stored in S3 18
  • 19. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Athena Security - 4 / 4 Access Use IAM policies to manage service access Use S3 permissions to manage r/w data access Encryption at-rest Data in S3 can be encrypted (SSE-S3, SSE-KMS, CSE-KMS) Query results can be encrypted when stored back in S3 Encryption in-transit Uses TLS encryption for accessing data stored in S3 Uses SSL encryption for JDBC clients 19
  • 20. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Real Time Big Data Analytics Kinesis FirehouseKinesis Streams Kinesis Analytics 20
  • 21. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why Kinesis ? - 1 / 5 Scalable Service bus to connect data producers and consumers IoT, Mobile / Web clients Building an enterprise-grade queuing system is challenging… Challenges Scale - as # or size of input data increases Reliability - store data until consumers downloads/processes Push / Pull support ... 21
  • 22. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Introduction to Kinesis Family - 2 / 5 Amazon Kinesis Streams Analyze streaming data Firehose Prepare and load streaming data into AWS Analytics Analyze streaming data with standard SQL Fully managed service Easily collect, process, and analyze real-time, streaming data. Equivalent OSS - Apache Kafka 22
  • 23. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Kinesis Streams - 3 / 5 Connects real-time data producers and consumers Put data using the Kinesis Producer Library (KPL) Get data using the Kinesis Client Library (KCL) Data-consumer apps pulls data from streams Streams are made of Shards add/remove shards to adjust capacity Encrypt sensitive data using SSE and AWS KMS 23
  • 24. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Kinesis Firehose - 4 / 5 Capture, transform, and push streaming real-time data into... Kinesis Analytics, S3, Redshift, Elasticsearch Service Loads new data within 60 seconds after being received Auto-scaling (stream management, sharding) and monitoring Batch, compress, and encrypt the data before loading it Add data conversion on-the-fly without a data processing pipeline 24
  • 25. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Kinesis Analytics - 5 / 5 Process streaming data in real time with standard SQL Sub 1-second processing latencies In: Kinesis streams or Firehose Automatically recognizes standard data formats; suggests schema Manually update schema or provide new one for unstructured data Out: Kinesis Streams and Firehose S3, Redshift, Elasticsearch Service, or custom destination. Write your queries using standard SQL 25
  • 26. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Big Data Storage & Databases DynamoDB Aurora 26
  • 27. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why DynamoDB ? - 1 / 2 More entities producing Non-Structured data Need a scalable & reliable solution to capture, process Building a Scalable, Reliable, Resilient NoSQL DB Infrastructure is an engineering challenge ! Some challenges... Infrastructure layer - replicate, scale, restore Database layer - synchronization, read-after-write Network layer - load balancer Security - restricted protected access …. 27
  • 28. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is DynamoDB - 2 / 2 Fast and flexible NoSQL database service Single-digit millisecond latency at any scale For microsecond response, use DynamoDB Accelerator (DAX) Supports both document and key-value store models Similar to MongoDB from this perspective Integrates with IAM, AWS Lambda for triggers 28
  • 29. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why Aurora ? - 1 / 4 Open-source RDBMS are free-to-use but… Do not meet commercial RDBMS performance Commercial RDBMS provide performance but… Are too expensive for many organizations Is there a best of both worlds ? 29
  • 30. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is Aurora ? - 2 / 4 Is a managed database service Similar to RDS Is a new relational database engine Compatible with MySQL 5.6, PostgreSQL 9.6 Existing ecosystem will work with little or no change Throughput 5x MySQL, 2x PostgreSQL On par with commercial databases Example: 500K reads, 100K writes 1/10th the cost of commercial databases 30
  • 31. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Aurora Innovations - 3 / 4 High Availability and Durability > 99.99% availability Data is replicated 6 times across 3 AZs Continuously backed to S3. Failure recovery is transparent / automated Database Migration Use standard MySQL import and export tools Use Database Migration Service (DMS) to migrate to Aurora 31
  • 32. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Aurora Security - 4 / 4 Isolation Network isolation using VPC Encryption-at-rest Storage is encrypted Includes backups, snapshots and replicas AWS KMS is transparently used Encryption in-motion Using SSL 32
  • 33. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Data Warehousing Redshift 33
  • 34. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Why Redshift ? - 1 / 5 Why do you need a data warehouse ? OLTP systems are good for transactions • Not a good fit for analytics-type workloads A centralized location to capture data from OLTP • Used for analytics, trend analytics Challenges Performance Cost Scalability … 34
  • 35. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com How AWS built Redshift ? - 2 / 5 35
  • 36. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com What is Redshift ? - 3 / 5 Distributed computing architecture Each node is independent, self-sufficient No single-point of failure Nodes don’t share memory, disk storage (SN architecture) Leader node Endpoint: SQL, JDBC/ODBC, BI tools Stores metadata, Coordinates parallel SQL processing Compute nodes Local, columnar storage Ingestion, Backup, Restore to… S3 / EMR / DynamoDB / SSH 36
  • 37. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Redshift Innovations - 4 / 5 1-click deployment to launch on multiple regions around the world Pay-as-you-go pricing $1000 / TB / Year Large collection of pre-installed software, libraries Pandas, NumPy and SciPy Write new User Defined Functions using Python 2.7 37
  • 38. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Redshift Security - 5 / 5 IAM roles Role-based Access Control 38
  • 39. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com REAN Cloud Data Lake Quick start / reference architecture 39
  • 40. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com REAN Cloud gains AWS Big Data Competency 40
  • 41. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Data Lake Reference architecture 41
  • 42. @2017 Copyright REAN Cloud. All rights reserved. info@reancloud.com | www.reancloud.com Demo, Workshop Demo • Live walkthrough of launching EMR, using Athena Workshop • Using Athena to analyze a open dataset 42
  • 43. Thank You! REAN Cloud LLC 2201 Cooperative Way, Suite 302 Herndon, VA 20171 +1(844) 377- (7326) Do you have any questions? info@reancloud.com www.reancloud.com Larry Bradley S. P. T. Krishnan Steve Toback Steve Vaughan