SlideShare a Scribd company logo
1 of 60
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
John Yeung, Solutions Architect
31 October 2017
Deep Dive on AWS with Demo
AWS Big Data and Machine Learning Day | Hong Kong
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to expect from the session
Big Data Challenges
Architectural Principles
Design Patterns
Demo (around 15 mins)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ever-Increasing Big Data
Volume
Velocity
Variety
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Evolution
Batch
Processing
Stream
Processing
Machine
Learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Plenty of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Amazon Kinesis
Streams app
Lambda Amazon ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service
Amazon Kinesis
Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data Challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architectural Principles
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture Principles
#1: Build Decoupled Systems
• Data → Store → Process → Store → Analyze → Answers
#2: Use Right Tool for the Job
• Data structure, latency, throughput, access patterns
#3: Leverage AWS Managed Services
• Scalable/elastic, available, reliable, secure, no/low admin
#4: Use Lambda Architecture Ideas
• Immutable (append-only) log, batch/speed/serving layer
#5: Be Cost-conscious
• Big data ≠ big cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simplify Big Data Processing
COLLECT STORE PROCESS/
ANALYZE
CONSUME
1. Time to answer (Latency)
2. Throughput
3. Cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
COLLECT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Types of DataCOLLECT
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Applications
In-memory data
Database records
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
LoggingTransport
Search documents
Log files
Messaging
Message MESSAGES
Messaging
Messages
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
Data streams
Transaction-based
File-based
Event-based
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Store
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STORE
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
COLLECT
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
LoggingTransport
Messaging
Message MESSAGES
MessagingApplications
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Types of Data Stores
Database SQL & NoSQL databases
Search Search engines
File store File systems
Queue Message queues
Stream
storage
Pub/sub message queues
In-memory Caches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
In-memory,
Database, Search
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
SearchSQLNoSQLCacheFile
LoggingIoTApplicationsTransportMessaging
Amazon ElastiCache
• Managed Memcached or Redis service
Amazon DynamoDB
• Managed NoSQL database service
Amazon RDS
• Managed relational database service
Amazon Elasticsearch Service
• Managed Elasticsearch service
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use the Right Tool for the Job
Data Tier
Search
Amazon Elasticsearch
Service
In-memory
Amazon ElastiCache
Redis
Memcached
SQL
Amazon Aurora
Amazon RDS
MySQL
PostgreSQL
Oracle
SQL Server
NoSQL
Amazon DynamoDB
Cassandra
HBase
MongoDB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon S3
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
File Storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Is Amazon S3 Good for Big Data
Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
Multiple & heterogeneous analysis clusters can use the same data
Unlimited number of objects and volume of data
Very high bandwidth – no aggregate throughput limit
Designed for 99.99% availability – can tolerate zone failure
Designed for 99.999999999% durability
No need to pay for data replication
Native support for versioning
Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies
Secure – SSL, client/server-side encryption at rest
Low cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In-memory
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Apache Kafka
Amazon DynamoDB
Streams
Amazon SQS
Amazon SQS
• Managed message queue service
Apache Kafka
• High throughput distributed streaming platform
Amazon Kinesis Streams
• Managed stream storage + processing
Amazon Kinesis Firehose
• Managed data delivery
Amazon DynamoDB
• Managed NoSQL database
• Tables can be stream-enabled
Message & Stream Storage
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
Applications
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
File store
LoggingTransport
Messaging
Message MESSAGES
Messaging
Message
Stream
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Stream Storage
Decouple producers & consumers
Persistent buffer
Collect multiple streams
Preserve client ordering
Parallel consumption
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1
Count of
red = 4
Count of
violet = 4
Consumer 2
Count of
blue = 4
Count of
green = 4
DynamoDB stream Amazon Kinesis stream Kafka topic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Stream Storage should I use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
AWS managed
service
Yes Yes Yes No Yes
Guaranteed
ordering
Yes Yes Yes Yes No
Delivery exactly-once at-least-once exactly-once at-least-once at-least-once
Data retention
period
24 hours 7 days N/A Configurable 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ
Scale /
throughput
No limit /
~ table IOPS
No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
Parallel clients Yes Yes No Yes No
Stream MapReduce Yes Yes N/A Yes N/A
Record/object size 400 KB 1 MB Redshift row size Configurable 256 KB
Cost Higher (table cost) Low Low Low (+admin) Low-medium
Hot Warm
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which Data Store Should I Use
Data Structure → Fixed schema, JSON, key-value
Access Patterns → Store data in the format you will access it
Data Characteristics → Hot, Warm, Cold
Cost → Right cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Structure and Access Patterns
Access Patterns What to use?
Put/Get (key, value) In-memory, NoSQL
Simple relationships → 1:N, M:N NoSQL
Multi-table joins, transaction, SQL SQL
Faceting, search Search
Data Structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, Search
(Key, value) In-memory, NoSQL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is the temperature of your data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data characteristics: Hot, Warm or Cold
Hot Warm Cold
Volume MB–GB GB–TB PB–EB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–high High Very high
Request rate Very high High Low
Cost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In-memory
SQL
Request rate
High Low
Cost/GB
High Low
Latency
Low High
Data volume
Low High
Amazon
Glacier
Structure
NoSQL
Hot data Warm data Cold data
Low
High
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which Data Store Should I Use
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS/Aurora
Amazon
ES
Amazon
S3
Amazon
Glacier
Average
latency
ms ms ms, sec ms,sec ms,sec,min
(~ size)
hrs
Typical
data stored
GB GB–TBs
(no limit)
GB–TB
(64 TB max)
GB–TB MB–PB
(no limit)
GB–PB
(no limit)
Typical
item size
B-KB KB
(400 KB max)
KB
(64 KB max)
B-KB
(2 GB max)
KB-TB
(5 TB max)
GB
(40 TB max)
Request
Rate
High – very high Very high
(no limit)
High High Low – high
(no limit)
Very low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10
Durability Low - moderate Very high Very high High Very high Very high
Availability High
2 AZ
Very high
3 AZ
Very high
3 AZ
High
2 AZ
Very high
3 AZ
Very high
3 AZ
Hot data Warm data Cold data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROCESS /
ANALYZE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics & Frameworks
Interactive
Takes seconds
Example: Self-service dashboards
Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)
Batch
Takes minutes to hours
Example: Daily/weekly/monthly reports
Amazon EMR (MapReduce, Hive, Pig, Spark)
Message
Takes milliseconds to seconds
Example: Message processing
Amazon SQS applications on Amazon EC2
Stream
Takes milliseconds to seconds
Example: Fraud alerts, 1 minute metrics
Amazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS
Lambda
PROCESS /
ANALYZE
Amazon
Machine Learning
MLMessage
Amazon SQS apps
Amazon EC2
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Fast
Amazon Redshift
Presto
Amazon
EMR
FastSlow
Amazon Athena
BatchInteractive
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What about ETL
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
Data Integration Partners
Reduce the effort to move, cleanse, synchronize,
manage, and automatize data related processes. AWS Glue
AWS	Glue	is	a	fully	managed	ETL	service	that	makes	
it	easy	to	understand	your	data	sources,	prepare	the	
data,	and	move	it	reliably	between	data	stores
New
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONSUME
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FileMessage
Stream
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
ETL
SearchSQLNoSQLCache
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Fast
Stream
Amazon EC2
Amazon EMR
Amazon SQS apps
Amazon Redshift
Amazon
Machine Learning
Presto
Amazon
EMR
FastSlow
Amazon EC2
Amazon Athena
BatchMessageInteractiveML
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebook
s
IDEAPI
Applications & API
Analysis and visualization
Notebooks
IDE
Business
users
Data scientist,
developers
COLLECT ETL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Put them together
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
COLLECT STORE CONSUME
PROCESS /
ANALYZE
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
Fast
Stream
SearchSQLNoSQLCacheFileMessageStream
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
LoggingIoTApplicationsTransportMessaging ETL
Amazon EMR
Amazon SQS apps
Amazon Redshift
Amazon
Machine Learning
Presto
Amazon
EMR
FastSlow
Amazon EC2
Amazon Athena
BatchMessageInteractiveML
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Design Patterns
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concept #1: Decoupled Data Bus
• Storage decoupled from processing
• Multiple stages
Store Process Store Process
Process
Store
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concept #2: Multiple Stream Processing
Process
Store
Amazon
Kinesis
Amazon
DynamoDB
Amazon
S3
AWS
Lambda
Amazon Kinesis
Connector
Library KCL
• Parallel processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concept #3: Multiple Data Stores
Amazon EMR
Amazon
Kinesis
AWS
Lambda
Amazon S3
Amazon
DynamoDB
Spark
Streaming
Amazon Kinesis
Connector
Library KCL
Spark
SQL
• Analysis framework reads from or writes to multiple data stores
Process
Store
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time prediction
KPI
DynamoDB
Streams
Amazon
Kinesis
Process
Store
Real-time Analytics
Design Pattern
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon
SQS
Amazon SQS
App
Amazon SQS
App
Amazon
SNS Subscribers
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Publish
App state
KPI
Amazon SQS
App
Amazon SQS
App
Auto Scaling group
Amazon
SQSPriority queue
Messages /
events
Process
Store
Message / Event
Processing
Design Pattern
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
Machine
Learning
Consume
Amazon Redshift
Amazon EMR
Presto
Spark
Batch
Mode
Interactive
Mode
Batch prediction
Real-time prediction
Amazon
Kinesis
Firehose
Amazon Athena
Amazon Kinesis
Analytics
Files
Process
Store
Interactive &
Batch Analytics
Design Pattern
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demonstration
Apply what we’ve just learnt
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-time Analytics Design Pattern
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
KibanaAmazon
ElasticSearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elastic Cloud Computing EC2
Amazon EC2 provides the Virtual Machines VMs, known as
instances, to run your web application on the platform you
choose. It allows you to configure and scale your compute
capacity easily to meet changing requirements and demand.
In this demo, this instance is installed with Apache Web
Server which continuously generates web access log records
and Amazon Kinesis Agent which streams these records to
Amazon Kinesis Firehose.
Apache
Web Server
+
Amazon
Kinesis Agent
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Firehose
Amazon Kinesis Firehose is a fully managed service for
delivering real-time streaming data to destinations such as
Amazon Simple Storage Service (AmazonS3), Amazon
Redshift, or Amazon Elasticsearch Service (Amazon ES).
In this step, we will create an Amazon Kinesis Firehose
delivery stream to save each log entry in Amazon S3 and to
provide the log data to the Amazon Kinesis Analytics
application.
Amazon
Kinesis
Firehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (1)
Apache
Web Server
Amazon
Kinesis
Firehose
Availability Zone #1
1. A Linux Instance is installed with
Amazon Kinesis Agent which sends
log records to Amazon Kinesis
Firehose continuously.
Streaming data
COLLECT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Simple Storage Service S3
Amazon S3 has a simple web services interface that you can
use to store and retrieve any amount of data, at any time,
from anywhere on the web. It gives any developer access to
the same highly scalable, reliable, fast, inexpensive data
storage infrastructure.
Examples:
Web Access Log, Static Web Site and Data Lake etc.
Amazon
S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Analytics
Amazon Kinesis Analytics enables you to query streaming
data or build entire streaming applications using SQL, so that
you can gain actionable insights promptly.
It takes care of everything required to run your queries
continuously and scales automatically to match the volume
and throughput rate of your incoming data.
Amazon
Kinesis
Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (2)
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon S3
bucket
Availability Zone #1
2a. Amazon Kinesis Firehose will
write each log record to Amazon
Simple Storage Service S3 for
durable storage.
COLLECT STORE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (2)
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
2b. Amazon Kinesis Analytics run a
SQL statement against the streaming
input data.
COLLECT STORE
PROCESS /
ANALYZE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SQL Operations Inside Kinesis Analytics
Source
Stream
Insert &
Select
(Pump)
Destination
Stream
Amazon Kinesis
Analytics
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
datetime VARCHAR(30), status INTEGER, statusCount INTEGER);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO
"DESTINATION_SQL_STREAM"
SELECT STREAM TIMESTAMP_TO_CHAR('yyyy-MM-
dd''T''HH:mm:ss.SSS', LOCALTIMESTAMP) as datetime,
"response" as status, COUNT(*) AS statusCount
FROM "SOURCE_SQL_STREAM_001"
GROUP BY "response", FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME
- TIMESTAMP '1970-01-01 00:00:00') minute / 1 TO MINUTE);
Amazon
Kinesis
Firehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (3)
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
COLLECT STORE
PROCESS /
ANALYZE
3. Amazon Kinesis Analytics
creates an aggregated data set every
minute and output that data to a
second Firehose delivery stream.
STORE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service ES
Amazon Elasticsearch Service makes it easy to deploy,
secure, operate, and scale Elasticsearch for log analytics, full
text search, application monitoring, and more. Amazon
Elasticsearch Service is a fully managed service that delivers
real-time analytics capabilities alongside the availability,
scalability, and security that production workloads require.
The service offers built-in integrations with Kibana, Logstash
and other AWS services. It enables you to go from raw data to
actionable insights quickly and securely.
Amazon
Elasticsearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (4)
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
Amazon
ElasticSearch
COLLECT STORE
PROCESS /
ANALYZE
STORE
4. This Firehose delivery stream will
write the aggregated data to an
Amazon ES domain.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kibana
Kibana lets you visualize your Elasticsearch data. It provides
you interactive visualizations with various types including
histograms, line graphs, pie charts, and more. It leverages the
full aggregation capabilities of Elasticsearch.
Kibana
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Real-time Analytics (5)
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
KibanaAmazon
ElasticSearch
COLLECT STORE
PROCESS /
ANALYZE
STORE CONSUME
5. Finally, use Kibana to visualize the
result of your system.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Implementation Steps
Apache
Web Server
Amazon
Kinesis
Firehose
Amazon
Kinesis
Firehose
Amazon Kinesis
Analytics
Amazon S3
bucket
Availability Zone #1
KibanaAmazon
ElasticSearch
COLLECT STORE
PROCESS /
ANALYZE
STORE CONSUME
1 2a
2b
345 6
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s build your own one in 60 mins!
https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
John Yeung | jyeung@amazon.com

More Related Content

What's hot

DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationAmazon Web Services
 
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdf
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdfAMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdf
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdfAmazon Web Services
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerAmazon Web Services
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingAmazon Web Services
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBAmazon Web Services
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
 
GPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data AnalyticsGPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data AnalyticsAmazon Web Services
 
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...Amazon Web Services
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAmazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMakerBDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMakerAmazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaMCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaAmazon Web Services
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...Amazon Web Services
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingAmazon Web Services
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersAmazon Web Services
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceAmazon Web Services
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonAmazon Web Services
 
An Introduction to AI Services on AWS - Web Summit Lisbon
An Introduction to AI Services on AWS -  Web Summit LisbonAn Introduction to AI Services on AWS -  Web Summit Lisbon
An Introduction to AI Services on AWS - Web Summit LisbonBoaz Ziniman
 
Building Data Driven Apps with AWS: Collision 2018
Building Data Driven Apps with AWS: Collision 2018Building Data Driven Apps with AWS: Collision 2018
Building Data Driven Apps with AWS: Collision 2018Amazon Web Services
 

What's hot (20)

DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational Transformation
 
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdf
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdfAMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdf
AMF303-Deep Dive into the Connected Vehicle Reference Architecture.pdf
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMaker
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
GPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data AnalyticsGPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data Analytics
 
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...
SRV423 [new launch] Introducing Amazon Macie — Visibility and Security for yo...
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMakerBDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
BDA304 Build Deep Learning Applications with TensorFlow and Amazon SageMaker
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon AlexaMCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
MCL202_Ally Bank & Cognizant Transforming Customer Experience Using Amazon Alexa
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million users
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial Intelligence
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and Amazon
 
An Introduction to AI Services on AWS - Web Summit Lisbon
An Introduction to AI Services on AWS -  Web Summit LisbonAn Introduction to AI Services on AWS -  Web Summit Lisbon
An Introduction to AI Services on AWS - Web Summit Lisbon
 
Building Data Driven Apps with AWS: Collision 2018
Building Data Driven Apps with AWS: Collision 2018Building Data Driven Apps with AWS: Collision 2018
Building Data Driven Apps with AWS: Collision 2018
 

Viewers also liked

Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...Amazon Web Services
 
Deep Dive on Amazon DynamoDB - AWS Online Tech Talks
Deep Dive on Amazon DynamoDB - AWS Online Tech TalksDeep Dive on Amazon DynamoDB - AWS Online Tech Talks
Deep Dive on Amazon DynamoDB - AWS Online Tech TalksAmazon Web Services
 
Serverless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemServerless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemAmazon Web Services
 
Building Chatbots with Amazon Lex
Building Chatbots with Amazon LexBuilding Chatbots with Amazon Lex
Building Chatbots with Amazon LexAmazon Web Services
 
Deep Dive on Amazon SES What's New - AWS Online Tech Talks
Deep Dive on Amazon SES What's New - AWS Online Tech TalksDeep Dive on Amazon SES What's New - AWS Online Tech Talks
Deep Dive on Amazon SES What's New - AWS Online Tech TalksAmazon Web Services
 
Getting Started with Serverless Apps
Getting Started with Serverless AppsGetting Started with Serverless Apps
Getting Started with Serverless AppsAmazon Web Services
 
You Don’t Need A Mobile App! Responsive Web Apps Using AWS
You Don’t Need A Mobile App! Responsive Web Apps Using AWSYou Don’t Need A Mobile App! Responsive Web Apps Using AWS
You Don’t Need A Mobile App! Responsive Web Apps Using AWSAmazon Web Services
 
Getting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceGetting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceAmazon Web Services
 
Serverless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesServerless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesAmazon Web Services
 
Building Smart Applications with Amazon Machine Learning.pdf
Building Smart Applications with Amazon Machine Learning.pdfBuilding Smart Applications with Amazon Machine Learning.pdf
Building Smart Applications with Amazon Machine Learning.pdfAmazon Web Services
 
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...Amazon Web Services
 
AWS Step Functions - Dev lounge Express Edition.pdf
AWS Step Functions - Dev lounge Express Edition.pdfAWS Step Functions - Dev lounge Express Edition.pdf
AWS Step Functions - Dev lounge Express Edition.pdfAmazon Web Services
 
Building Serverless Websites with Lambda@Edge - AWS Online Tech Talks
Building Serverless Websites with Lambda@Edge - AWS Online Tech TalksBuilding Serverless Websites with Lambda@Edge - AWS Online Tech Talks
Building Serverless Websites with Lambda@Edge - AWS Online Tech TalksAmazon Web Services
 

Viewers also liked (20)

Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...
Set it and Forget it: Auto Scaling Target Tracking Policies - AWS Online Tech...
 
AWS Security Fundamentals
AWS Security FundamentalsAWS Security Fundamentals
AWS Security Fundamentals
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Deep Dive on Amazon DynamoDB - AWS Online Tech Talks
Deep Dive on Amazon DynamoDB - AWS Online Tech TalksDeep Dive on Amazon DynamoDB - AWS Online Tech Talks
Deep Dive on Amazon DynamoDB - AWS Online Tech Talks
 
Serverless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemServerless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat System
 
Getting Started with AWS IoT
Getting Started with AWS IoTGetting Started with AWS IoT
Getting Started with AWS IoT
 
Building Chatbots with Amazon Lex
Building Chatbots with Amazon LexBuilding Chatbots with Amazon Lex
Building Chatbots with Amazon Lex
 
Deep Dive on Amazon SES What's New - AWS Online Tech Talks
Deep Dive on Amazon SES What's New - AWS Online Tech TalksDeep Dive on Amazon SES What's New - AWS Online Tech Talks
Deep Dive on Amazon SES What's New - AWS Online Tech Talks
 
Getting Started with Serverless Apps
Getting Started with Serverless AppsGetting Started with Serverless Apps
Getting Started with Serverless Apps
 
You Don’t Need A Mobile App! Responsive Web Apps Using AWS
You Don’t Need A Mobile App! Responsive Web Apps Using AWSYou Don’t Need A Mobile App! Responsive Web Apps Using AWS
You Don’t Need A Mobile App! Responsive Web Apps Using AWS
 
Getting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceGetting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container Service
 
Serverless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesServerless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best Practices
 
Building Smart Applications with Amazon Machine Learning.pdf
Building Smart Applications with Amazon Machine Learning.pdfBuilding Smart Applications with Amazon Machine Learning.pdf
Building Smart Applications with Amazon Machine Learning.pdf
 
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...
Big Data Experience Sharing: Building Collaborative Data Analytics Platform -...
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
9 Security Best Practices
9 Security Best Practices9 Security Best Practices
9 Security Best Practices
 
AWS Step Functions - Dev lounge Express Edition.pdf
AWS Step Functions - Dev lounge Express Edition.pdfAWS Step Functions - Dev lounge Express Edition.pdf
AWS Step Functions - Dev lounge Express Edition.pdf
 
Building Serverless Websites with Lambda@Edge - AWS Online Tech Talks
Building Serverless Websites with Lambda@Edge - AWS Online Tech TalksBuilding Serverless Websites with Lambda@Edge - AWS Online Tech Talks
Building Serverless Websites with Lambda@Edge - AWS Online Tech Talks
 
Intro to Amazon AI Services
Intro to Amazon AI ServicesIntro to Amazon AI Services
Intro to Amazon AI Services
 

Similar to Deep Dive on Big Data

ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...Amazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

Similar to Deep Dive on Big Data (20)

ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Serverless Developer Experience
Serverless Developer ExperienceServerless Developer Experience
Serverless Developer Experience
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Deep Dive on Big Data

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. John Yeung, Solutions Architect 31 October 2017 Deep Dive on AWS with Demo AWS Big Data and Machine Learning Day | Hong Kong
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session Big Data Challenges Architectural Principles Design Patterns Demo (around 15 mins)
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ever-Increasing Big Data Volume Velocity Variety
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Evolution Batch Processing Stream Processing Machine Learning
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Plenty of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Amazon Kinesis Streams app Lambda Amazon ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service Amazon Kinesis Analytics
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Challenges Why? How? What tools should I use? Is there a reference architecture?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural Principles
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture Principles #1: Build Decoupled Systems • Data → Store → Process → Store → Analyze → Answers #2: Use Right Tool for the Job • Data structure, latency, throughput, access patterns #3: Leverage AWS Managed Services • Scalable/elastic, available, reliable, secure, no/low admin #4: Use Lambda Architecture Ideas • Immutable (append-only) log, batch/speed/serving layer #5: Be Cost-conscious • Big data ≠ big cost
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simplify Big Data Processing COLLECT STORE PROCESS/ ANALYZE CONSUME 1. Time to answer (Latency) 2. Throughput 3. Cost
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. COLLECT
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Types of DataCOLLECT Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications In-memory data Database records AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES LoggingTransport Search documents Log files Messaging Message MESSAGES Messaging Messages Devices Sensors & IoT platforms AWS IoT STREAMS IoT Data streams Transaction-based File-based Event-based
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Store
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STORE Devices Sensors & IoT platforms AWS IoT STREAMS IoT COLLECT AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES LoggingTransport Messaging Message MESSAGES MessagingApplications Mobile apps Web apps Data centers AWS Direct Connect RECORDS Types of Data Stores Database SQL & NoSQL databases Search Search engines File store File systems Queue Message queues Stream storage Pub/sub message queues In-memory Caches
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In-memory COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging In-memory, Database, Search
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon Elasticsearch Service Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS SearchSQLNoSQLCacheFile LoggingIoTApplicationsTransportMessaging Amazon ElastiCache • Managed Memcached or Redis service Amazon DynamoDB • Managed NoSQL database service Amazon RDS • Managed relational database service Amazon Elasticsearch Service • Managed Elasticsearch service
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch Service In-memory Amazon ElastiCache Redis Memcached SQL Amazon Aurora Amazon RDS MySQL PostgreSQL Oracle SQL Server NoSQL Amazon DynamoDB Cassandra HBase MongoDB
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In-memory COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon S3 Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging File Storage
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Is Amazon S3 Good for Big Data Natively supported by big data frameworks (Spark, Hive, Presto, etc.) Multiple & heterogeneous analysis clusters can use the same data Unlimited number of objects and volume of data Very high bandwidth – no aggregate throughput limit Designed for 99.99% availability – can tolerate zone failure Designed for 99.999999999% durability No need to pay for data replication Native support for versioning Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies Secure – SSL, client/server-side encryption at rest Low cost
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In-memory Amazon Kinesis Firehose Amazon Kinesis Streams Apache Kafka Amazon DynamoDB Streams Amazon SQS Amazon SQS • Managed message queue service Apache Kafka • High throughput distributed streaming platform Amazon Kinesis Streams • Managed stream storage + processing Amazon Kinesis Firehose • Managed data delivery Amazon DynamoDB • Managed NoSQL database • Tables can be stream-enabled Message & Stream Storage Devices Sensors & IoT platforms AWS IoT STREAMS IoT COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database Applications AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search File store LoggingTransport Messaging Message MESSAGES Messaging Message Stream
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Stream Storage Decouple producers & consumers Persistent buffer Collect multiple streams Preserve client ordering Parallel consumption 4 4 3 3 2 2 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 shard 1 / partition 1 shard 2 / partition 2 Consumer 1 Count of red = 4 Count of violet = 4 Consumer 2 Count of blue = 4 Count of green = 4 DynamoDB stream Amazon Kinesis stream Kafka topic
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Stream Storage should I use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS AWS managed service Yes Yes Yes No Yes Guaranteed ordering Yes Yes Yes Yes No Delivery exactly-once at-least-once exactly-once at-least-once at-least-once Data retention period 24 hours 7 days N/A Configurable 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic Parallel clients Yes Yes No Yes No Stream MapReduce Yes Yes N/A Yes N/A Record/object size 400 KB 1 MB Redshift row size Configurable 256 KB Cost Higher (table cost) Low Low Low (+admin) Low-medium Hot Warm
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which Data Store Should I Use Data Structure → Fixed schema, JSON, key-value Access Patterns → Store data in the format you will access it Data Characteristics → Hot, Warm, Cold Cost → Right cost
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Structure and Access Patterns Access Patterns What to use? Put/Get (key, value) In-memory, NoSQL Simple relationships → 1:N, M:N NoSQL Multi-table joins, transaction, SQL SQL Faceting, search Search Data Structure What to use? Fixed schema SQL, NoSQL Schema-free (JSON) NoSQL, Search (Key, value) In-memory, NoSQL
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is the temperature of your data
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data characteristics: Hot, Warm or Cold Hot Warm Cold Volume MB–GB GB–TB PB–EB Item size B–KB KB–MB KB–TB Latency ms ms, sec min, hrs Durability Low–high High Very high Request rate Very high High Low Cost/GB $$-$ $-¢¢ ¢ Hot data Warm data Cold data
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In-memory SQL Request rate High Low Cost/GB High Low Latency Low High Data volume Low High Amazon Glacier Structure NoSQL Hot data Warm data Cold data Low High
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which Data Store Should I Use Amazon ElastiCache Amazon DynamoDB Amazon RDS/Aurora Amazon ES Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec ms,sec,min (~ size) hrs Typical data stored GB GB–TBs (no limit) GB–TB (64 TB max) GB–TB MB–PB (no limit) GB–PB (no limit) Typical item size B-KB KB (400 KB max) KB (64 KB max) B-KB (2 GB max) KB-TB (5 TB max) GB (40 TB max) Request Rate High – very high Very high (no limit) High High Low – high (no limit) Very low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10 Durability Low - moderate Very high Very high High Very high Very high Availability High 2 AZ Very high 3 AZ Very high 3 AZ High 2 AZ Very high 3 AZ Very high 3 AZ Hot data Warm data Cold data
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PROCESS / ANALYZE
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics & Frameworks Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Message Takes milliseconds to seconds Example: Message processing Amazon SQS applications on Amazon EC2 Stream Takes milliseconds to seconds Example: Fraud alerts, 1 minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda PROCESS / ANALYZE Amazon Machine Learning MLMessage Amazon SQS apps Amazon EC2 Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Fast Amazon Redshift Presto Amazon EMR FastSlow Amazon Athena BatchInteractive
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What about ETL https://aws.amazon.com/big-data/partner-solutions/ ETLSTORE PROCESS / ANALYZE Data Integration Partners Reduce the effort to move, cleanse, synchronize, manage, and automatize data related processes. AWS Glue AWS Glue is a fully managed ETL service that makes it easy to understand your data sources, prepare the data, and move it reliably between data stores New
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CONSUME
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FileMessage Stream Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging ETL SearchSQLNoSQLCache Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Fast Stream Amazon EC2 Amazon EMR Amazon SQS apps Amazon Redshift Amazon Machine Learning Presto Amazon EMR FastSlow Amazon EC2 Amazon Athena BatchMessageInteractiveML
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STORE CONSUMEPROCESS / ANALYZE Amazon QuickSight Apps & Services Analysis&visualizationNotebook s IDEAPI Applications & API Analysis and visualization Notebooks IDE Business users Data scientist, developers COLLECT ETL
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Put them together
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming Amazon Kinesis Analytics KCL apps AWS Lambda COLLECT STORE CONSUME PROCESS / ANALYZE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm Fast Stream SearchSQLNoSQLCacheFileMessageStream Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI LoggingIoTApplicationsTransportMessaging ETL Amazon EMR Amazon SQS apps Amazon Redshift Amazon Machine Learning Presto Amazon EMR FastSlow Amazon EC2 Amazon Athena BatchMessageInteractiveML
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Design Patterns
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Concept #1: Decoupled Data Bus • Storage decoupled from processing • Multiple stages Store Process Store Process Process Store
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Concept #2: Multiple Stream Processing Process Store Amazon Kinesis Amazon DynamoDB Amazon S3 AWS Lambda Amazon Kinesis Connector Library KCL • Parallel processing
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Concept #3: Multiple Data Stores Amazon EMR Amazon Kinesis AWS Lambda Amazon S3 Amazon DynamoDB Spark Streaming Amazon Kinesis Connector Library KCL Spark SQL • Analysis framework reads from or writes to multiple data stores Process Store
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alert App state Real-time prediction KPI DynamoDB Streams Amazon Kinesis Process Store Real-time Analytics Design Pattern
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SQS Amazon SQS App Amazon SQS App Amazon SNS Subscribers Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Publish App state KPI Amazon SQS App Amazon SQS App Auto Scaling group Amazon SQSPriority queue Messages / events Process Store Message / Event Processing Design Pattern
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Amazon EMR Hive Pig Spark Amazon Machine Learning Consume Amazon Redshift Amazon EMR Presto Spark Batch Mode Interactive Mode Batch prediction Real-time prediction Amazon Kinesis Firehose Amazon Athena Amazon Kinesis Analytics Files Process Store Interactive & Batch Analytics Design Pattern
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demonstration Apply what we’ve just learnt
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-time Analytics Design Pattern Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 KibanaAmazon ElasticSearch
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elastic Cloud Computing EC2 Amazon EC2 provides the Virtual Machines VMs, known as instances, to run your web application on the platform you choose. It allows you to configure and scale your compute capacity easily to meet changing requirements and demand. In this demo, this instance is installed with Apache Web Server which continuously generates web access log records and Amazon Kinesis Agent which streams these records to Amazon Kinesis Firehose. Apache Web Server + Amazon Kinesis Agent
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Firehose Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (AmazonS3), Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES). In this step, we will create an Amazon Kinesis Firehose delivery stream to save each log entry in Amazon S3 and to provide the log data to the Amazon Kinesis Analytics application. Amazon Kinesis Firehose
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (1) Apache Web Server Amazon Kinesis Firehose Availability Zone #1 1. A Linux Instance is installed with Amazon Kinesis Agent which sends log records to Amazon Kinesis Firehose continuously. Streaming data COLLECT
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Simple Storage Service S3 Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure. Examples: Web Access Log, Static Web Site and Data Lake etc. Amazon S3
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Analytics Amazon Kinesis Analytics enables you to query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights promptly. It takes care of everything required to run your queries continuously and scales automatically to match the volume and throughput rate of your incoming data. Amazon Kinesis Analytics
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (2) Apache Web Server Amazon Kinesis Firehose Amazon S3 bucket Availability Zone #1 2a. Amazon Kinesis Firehose will write each log record to Amazon Simple Storage Service S3 for durable storage. COLLECT STORE
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (2) Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 2b. Amazon Kinesis Analytics run a SQL statement against the streaming input data. COLLECT STORE PROCESS / ANALYZE
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SQL Operations Inside Kinesis Analytics Source Stream Insert & Select (Pump) Destination Stream Amazon Kinesis Analytics CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( datetime VARCHAR(30), status INTEGER, statusCount INTEGER); CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM TIMESTAMP_TO_CHAR('yyyy-MM- dd''T''HH:mm:ss.SSS', LOCALTIMESTAMP) as datetime, "response" as status, COUNT(*) AS statusCount FROM "SOURCE_SQL_STREAM_001" GROUP BY "response", FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') minute / 1 TO MINUTE); Amazon Kinesis Firehose
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (3) Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 COLLECT STORE PROCESS / ANALYZE 3. Amazon Kinesis Analytics creates an aggregated data set every minute and output that data to a second Firehose delivery stream. STORE
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service ES Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers real-time analytics capabilities alongside the availability, scalability, and security that production workloads require. The service offers built-in integrations with Kibana, Logstash and other AWS services. It enables you to go from raw data to actionable insights quickly and securely. Amazon Elasticsearch
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (4) Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 Amazon ElasticSearch COLLECT STORE PROCESS / ANALYZE STORE 4. This Firehose delivery stream will write the aggregated data to an Amazon ES domain.
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kibana Kibana lets you visualize your Elasticsearch data. It provides you interactive visualizations with various types including histograms, line graphs, pie charts, and more. It leverages the full aggregation capabilities of Elasticsearch. Kibana
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Real-time Analytics (5) Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 KibanaAmazon ElasticSearch COLLECT STORE PROCESS / ANALYZE STORE CONSUME 5. Finally, use Kibana to visualize the result of your system.
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Implementation Steps Apache Web Server Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon Kinesis Analytics Amazon S3 bucket Availability Zone #1 KibanaAmazon ElasticSearch COLLECT STORE PROCESS / ANALYZE STORE CONSUME 1 2a 2b 345 6
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s build your own one in 60 mins! https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! John Yeung | jyeung@amazon.com