SlideShare a Scribd company logo
1 of 60
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Arie Leeuwesteijn, Solutions Architect, AWS
September 2016
Big Data Architectural Patterns
and Best Practices on AWS
Agenda
Big data challenges
How to simplify big data processing
What technologies should you use?
• Why?
• How?
Reference architecture
Design patterns
Ever Increasing Big Data
Volume
Velocity
Variety
Big Data Evolution
Batch
processing
Stream
processing
Machine
learning
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Kinesis-enabled
app
Lambda ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service
Big Data Challenges
Is there a reference architecture?
What tools should I use?
How?
Why?
Decoupled “data bus”
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Use Lambda architecture ideas
• Immutable (append-only) log, batch/speed/serving layer
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Big data ≠ big cost
Architectural Principles
Simplify Big Data Processing
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
Time to answer (Latency)
Throughput
Cost
COLLECT
Types of Data
Database records
Search documents
Log files
Messaging events
Devices / sensors / IoT stream
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Stream
storage
IoT
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
Applications
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
File store
LoggingTransport
Messaging
Message MESSAGES Queue
Messaging
Store
Amazon Kinesis
Firehose
Amazon Kinesis
Streams/Analytics
Apache Kafka
Amazon DynamoDB
Streams
Amazon SQS
Amazon SQS
• Managed message queue service
Apache Kafka
• High throughput distributed messaging system
Amazon Kinesis Streams
• Managed stream storage + processing
Amazon Kinesis Analytics
• Process streaming data in real time
Amazon Kinesis Firehose
• Managed data delivery
Amazon DynamoDB
• Managed NoSQL database
• Tables can be stream-enabled
Message & Stream Storage
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
Applications
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENT
S
FILES
Search
File store
LoggingTransport
Messaging
Message MESSAGES
Messaging
Queue
Stream
Why Stream Storage?
Decouple producers & consumers
Persistent buffer
Collect multiple streams
Preserve client ordering
Streaming MapReduce
Parallel consumption
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1
Count of
red = 4
Count of
violet = 4
Consumer 2
Count of
blue = 4
Count of
green = 4
DynamoDB stream Amazon Kinesis stream Kafka topic
What About Queues & Pub/Sub ?
• Decouple producers &
consumers/subscribers
• Persistent buffer
• Collect multiple streams
• No client ordering
• No parallel consumption for
Amazon SQS
• Amazon SNS can route
to multiple queues or ʎ
functions
• No streaming MapReduce
Consumers
Producers
Producers
Amazon SNS
Amazon SQS
queue
topic
function
ʎ
AWS Lambda
Amazon SQS
queue
Subscriber
What Stream Storage should I use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
AWS managed
service
Yes Yes Yes No Yes
Guaranteed
ordering
Yes Yes Yes Yes No
Delivery exactly-once at-least-once exactly-once at-least-once at-least-once
Data retention
period
24 hours 7 days N/A Configurable 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ
Scale /
throughput
No limit /
~ table IOPS
No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
Parallel clients Yes Yes No Yes No
Stream MapReduce Yes Yes N/A Yes N/A
Record/object size 400 KB 1 MB Amazon Redshift row size Configurable 256 KB
Cost Higher (table cost) Low Low Low (+admin) Low-medium
Hot Warm
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENT
S
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon S3
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
File Storage
Why is Amazon S3 Good for Big Data?
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Hadoop clusters & Amazon EC2 Spot Instances
• Multiple distinct (Spark, Hive, Presto) clusters can use the same data
• Unlimited number of objects
• Very high bandwidth – no aggregate throughput limit
• Highly available – can tolerate AZ failure
• Designed for 99.999999999% durability
• Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy
• Secure – SSL, client/server-side encryption at rest
• Low cost
What about HDFS & Amazon Glacier?
• Use HDFS for very frequently accessed
(hot) data
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for
infrequently accessed data
• Use Amazon Glacier for archiving cold data
Cache, database, search
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENT
S
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
SearchSQLNoSQLCacheFile
LoggingIoTApplicationsTransportMessaging
Database Anti-pattern
Data Tier
Database tier
Best Practice - Use the Right Tool for the Job
Data Tier
Search
Amazon
Elasticsearch
and
CloudSearch
Cache
Amazon
ElastiCache
Redis
Memcached
SQL
Amazon Aurora
MySQL
PostgreSQL
Oracle
SQL Server
MariaDB
NoSQL
Amazon
DynamoDB
Cassandra
HBase
MongoDB
Database tier
Materialized Views
Amazon
Elasticsearch
Service
KCL
What Data Store Should I Use?
Data structure → Fixed schema, JSON, key-value
Access patterns → Store data in the format you will access it
Data / access characteristics → Hot, warm, cold
Cost → Right cost
Data Structure and Access Patterns
Access Patterns What to use?
Put/Get (key, value) Cache, NoSQL
Simple relationships → 1:N, M:N NoSQL
Cross table joins, transaction, SQL SQL
Faceting, search, free text Search
Data Structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, Search
(Key, value) Cache, NoSQL
What Is the Temperature of Your Data / Access ?
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low – high High Very high
Request rate Very high High Low
Cost/GB $$-$ $-¢¢ ¢
Hot data Warm data Cold data
Data / Access Characteristics: Hot, Warm, Cold
Cache
SQL
Request rate
High Low
Cost/GB
High Low
Latency
Low High
Data volume
Low High
Amazon
Glacier
Structure
NoSQL
Hot data Warm data Cold data
Low
High
Search
Amazon ElastiCache Amazon
DynamoDB
Amazon
RDS/Aurora
Amazon
Elasticsearch
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec ms,sec,min
(~ size)
hrs
Typical
data stored
GB GB–TBs
(no limit)
GB–TB
(64 TB max)
GB–TB MB–PB
(no limit)
GB–PB
(no limit)
Typical
item size
B-KB KB
(400 KB max)
KB
(64 KB max)
KB
(2 GB max)
KB-TB
(5 TB max)
GB
(40 TB max)
Request
Rate
High – very high Very high
(no limit)
High High Low – high
(no limit)
Very low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢/10
Durability Low - moderate Very high Very high High Very high Very high
Availability High
2 AZ
Very high
3 AZ
Very high
3 AZ
High
2 AZ
Very high
3 AZ
Very high
3 AZ
Hot data Warm data Cold data
What Data Store Should I Use?
Cost Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000
Cost Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
https://calculator.s3.amazonaws.com/index.html
Simple Monthly
Calculator
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483 777,600,000
Amazon S3 or
Amazon
DynamoDB?
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1300 2,048 1,483 777,600,000
Scenario 2300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use
COLLECT STORE PROCESS / ANALYZE
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
SearchSQLNoSQLCacheFileMessage
Stream
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENT
S
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
Process /
analyze
PROCESS /
ANALYZE
Process / Analyze
• Batch - Minutes or hours on cold data
• Daily/weekly/monthly reports
• Interactive – Seconds on warm/cold data
• Self-service dashboards
• Messaging – Milliseconds or seconds on hot data
• Message/event buffering
• Streaming - Milliseconds or seconds on hot data
• Billing/fraud alerts, 1 minute metrics
What Analytics Technology Should I Use?
Amazon
Redshift
Amazon EMR
Presto Spark Hive
Query
latency
Low Low Low High
Durability High High High High
Data volume 1.6 PB max ~Nodes ~Nodes ~Nodes
AWS
managed
Yes Yes Yes Yes
Storage Native HDFS / S3 HDFS / S3 HDFS / S3
SQL
compatibility
High High Low (SparkSQL) Medium (HQL)
Slow
Spark Streaming
Apache Storm
AWS Lambda
KCL apps
Amazon
Redshift
Amazon
Redshift
Hive
Spark
Presto
Amazon Kinesis
Apache Kafka
Amazon
DynamoDB
Amazon S3data
Hot Cold
Data temperature
Processingspeed
Fast
Slow Answers
Hive
Native apps
KCL apps
AWS Lambda
Data Temperature vs. Processing Speed
• Replicate within, to, or from Amazon EC2 or RDS
• Move data to the same or different database engine
• Keep your apps running during the migration
• Start your first migration in 10 minutes or less.
AWS
Database Migration
Service
• Migrate from Oracle, SQL Server and
PostgreSQL
• Converts your database schema including
tables, stored procedures, views, and DML
• Downloadable client install for Microsoft
Windows, Mac and Linux desktops
• Highlight where manual edits are needed
AWS
Schema
Conversion Tool
Predictions via Machine Learning
ML gives computers the ability to learn without being explicitly
programmed
Machine learning algorithms:
Supervised learning ← “teach” program
- Classification ← Is this transaction fraud? (yes/no)
- Regression ← Customer life-time value?
Unsupervised learning ← Let it learn by itself
- Clustering ← Market segmentation
Tools and Frameworks
Machine Learning
• Amazon ML, Amazon EMR (Spark ML)
Interactive
• Amazon Redshift, Amazon EMR (Presto, Spark)
Batch
• Amazon EMR (MapReduce, Hive, Pig, Spark)
Messaging
• Amazon SQS application on Amazon EC2
Streaming
• Micro-batch: Spark Streaming, KCL
• Real-time: Amazon Kinesis Analytics, Storm,
AWS Lambda, KCL
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
PROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
FastSlowFast
BatchMessageInteractiveStreamML
Amazon EC2
Amazon EC2
What Streaming / Messaging Technology Should I Use?
Spark
Streaming
Apache
Storm
Kinesis KCL
Application
AWS Lambda Amazon SQS
Apps
Scale ~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes
Micro-batch or
Real-time
Micro-batch Real-time Near-real-time Near-real-time Near-real-time
AWS managed
service
Yes (EMR) No (EC2) No (KCL + EC2
+ Auto Scaling)
Yes No (EC2 + Auto
Scaling)
Scalability No limits
~ nodes
No limits
~ nodes
No limits
~ nodes
No limits No limits
Availability Single AZ Configurable Multi-AZ Multi-AZ Multi-AZ
Programming
languages
Java, Python,
Scala
Any language
via Thrift
Java, via
MultiLang
Daemon (.NET,
Python, Ruby,
Node.js)
Node.js, Java,
Python
AWS SDK
languages
(Java, .NET,
Python, …)
What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileMessage
Stream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENT
S
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
ETL
CONSUME
STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Applications & API
Analysis and visualization
Notebooks
IDE
Business
users
Data scientist,
developers
COLLECT ETL
Putting It All Together
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileQueueStream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENT
S
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Reference architecture
LoggingIoTApplicationsTransportMessaging
ETL
Design Patterns
Common issues
• Storage type
• Does not match data type
• Access patterns
• Price/capacity, deletion of data
• Processing models
• Tight coupling
• Over capacity
• Producer bottleneck
• Performance, throughput
• Tooling
Primitive: Multi-Stage Decoupled “Data Bus”
Multiple stages
Storage decouples multiple processing stages
Store Process Store Process
process
store
Primitive: Multiple Stream Processing Applications
Can Read from Amazon Kinesis
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
Amazon
Kinesis S3
Connector
process
store
Amazon S3
Amazon EMR
Primitive: Analysis Frameworks Could Read
from Multiple Data Stores
Amazon
Kinesis
AWS
Lambda
Amazon S3
Amazon
DynamoDB
Spark
Streaming
Amazon
Kinesis S3
Connector
process
store
Spark
SQL
Amazon EMR
Real-time Analytics
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time prediction
KPI
process
store
DynamoDB
Streams
Amazon
Kinesis
Message / Event
Processing
Amazon
SQS
Amazon SQS
App
Amazon SQS
App
Amazon
SNS Subscribers
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Publish
App state
KPI
process
store
Amazon SQS
App
Amazon SQS
App
Auto Scaling group
Amazon
SQSPriority queue Messages /
events
Interactive
&
Batch
Analytics
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon
Redshift
Amazon EMR
Presto
Spark
Batch
Interactive
Batch prediction
Real-time prediction
Amazon
Kinesis
Firehose
Batch layer
Amazon
Kinesis
Data
stream
process
store
Amazon
Kinesis
Firehose
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed layer
answer
Serving
layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Storm
Lambda
Architecture
Spark
Streaming on
Amazon EMR
Customer Cases
Summary
Decoupled “data bus”
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Use Lambda architecture ideas
• Immutable (append-only) log, batch/speed/serving layer
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Big data ≠ big cost
Thank you!
aws.amazon.com/big-data

More Related Content

What's hot

Database migration simple, cross-engine and cross-platform migrations with ...
Database migration   simple, cross-engine and cross-platform migrations with ...Database migration   simple, cross-engine and cross-platform migrations with ...
Database migration simple, cross-engine and cross-platform migrations with ...Amazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAmazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWSAmazon Web Services
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoAmazon Web Services
 
Ponencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridPonencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridAmazon Web Services
 
AWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAmazon Web Services
 
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAmazon Web Services
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS SecurityAmazon Web Services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 

What's hot (20)

Database migration simple, cross-engine and cross-platform migrations with ...
Database migration   simple, cross-engine and cross-platform migrations with ...Database migration   simple, cross-engine and cross-platform migrations with ...
Database migration simple, cross-engine and cross-platform migrations with ...
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS(DVO303) Scaling Infrastructure Operations with AWS
(DVO303) Scaling Infrastructure Operations with AWS
 
Value, TCO & Cost Optimisation
Value, TCO & Cost OptimisationValue, TCO & Cost Optimisation
Value, TCO & Cost Optimisation
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - Toronto
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Ponencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridPonencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - Madrid
 
AWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAWS Storage and Content Delivery Services
AWS Storage and Content Delivery Services
 
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access ManagementAWSome Day 2016 - Module 3: Security, Identity, and Access Management
AWSome Day 2016 - Module 3: Security, Identity, and Access Management
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 
Crypto Options in AWS
Crypto Options in AWSCrypto Options in AWS
Crypto Options in AWS
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Databases on AWS Workshop.pdf
Databases on AWS Workshop.pdfDatabases on AWS Workshop.pdf
Databases on AWS Workshop.pdf
 

Viewers also liked

Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSAmazon Web Services
 
Hackproof Your Cloud: Responding to 2016 Threats
Hackproof Your Cloud: Responding to 2016 ThreatsHackproof Your Cloud: Responding to 2016 Threats
Hackproof Your Cloud: Responding to 2016 ThreatsAmazon Web Services
 
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices Customer Sharing: Trend Micro - Trend Micro's DevOps Practices
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices Amazon Web Services
 
AWS Partner ConneXions Taiwan - Q3 2016 Technology Update
AWS Partner ConneXions Taiwan - Q3 2016 Technology UpdateAWS Partner ConneXions Taiwan - Q3 2016 Technology Update
AWS Partner ConneXions Taiwan - Q3 2016 Technology UpdateAmazon Web Services
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsAmazon Web Services
 
Digital Workloads on Amazon Web Services
Digital Workloads on Amazon Web ServicesDigital Workloads on Amazon Web Services
Digital Workloads on Amazon Web ServicesAmazon Web Services
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryAmazon Web Services
 
Maximizing Business Value as You Migrate to AWS
Maximizing Business Value as You Migrate to AWSMaximizing Business Value as You Migrate to AWS
Maximizing Business Value as You Migrate to AWSAmazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsAmazon Web Services
 
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...Amazon Web Services
 
another day, another billion packets
another day, another billion packetsanother day, another billion packets
another day, another billion packetsAmazon Web Services
 
Barracuda WAF: Scalable Security for Applications on AWS
Barracuda WAF: Scalable Security for Applications on AWSBarracuda WAF: Scalable Security for Applications on AWS
Barracuda WAF: Scalable Security for Applications on AWSAmazon Web Services
 
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...Amazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
AWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWSAWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWSAmazon Web Services
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the CloudAmazon Web Services
 
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar SeriesAmazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar SeriesAmazon Web Services
 

Viewers also liked (20)

Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
Getting Started on AWS
Getting Started on AWSGetting Started on AWS
Getting Started on AWS
 
Hackproof Your Cloud: Responding to 2016 Threats
Hackproof Your Cloud: Responding to 2016 ThreatsHackproof Your Cloud: Responding to 2016 Threats
Hackproof Your Cloud: Responding to 2016 Threats
 
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices Customer Sharing: Trend Micro - Trend Micro's DevOps Practices
Customer Sharing: Trend Micro - Trend Micro's DevOps Practices
 
AWS Partner ConneXions Taiwan - Q3 2016 Technology Update
AWS Partner ConneXions Taiwan - Q3 2016 Technology UpdateAWS Partner ConneXions Taiwan - Q3 2016 Technology Update
AWS Partner ConneXions Taiwan - Q3 2016 Technology Update
 
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer ToolsDevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
DevOps on AWS: Deep Dive on Continuous Delivery and the AWS Developer Tools
 
Digital Workloads on Amazon Web Services
Digital Workloads on Amazon Web ServicesDigital Workloads on Amazon Web Services
Digital Workloads on Amazon Web Services
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - Calgary
 
Maximizing Business Value as You Migrate to AWS
Maximizing Business Value as You Migrate to AWSMaximizing Business Value as You Migrate to AWS
Maximizing Business Value as You Migrate to AWS
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with M...
 
AWSome Day Leeds
AWSome Day Leeds AWSome Day Leeds
AWSome Day Leeds
 
another day, another billion packets
another day, another billion packetsanother day, another billion packets
another day, another billion packets
 
Barracuda WAF: Scalable Security for Applications on AWS
Barracuda WAF: Scalable Security for Applications on AWSBarracuda WAF: Scalable Security for Applications on AWS
Barracuda WAF: Scalable Security for Applications on AWS
 
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...
AWS re:Invent 2016: AWS Customers Saving Lives with Mobile and IoT Technology...
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
AWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWSAWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWS
 
Security Innovations in the Cloud
Security Innovations in the CloudSecurity Innovations in the Cloud
Security Innovations in the Cloud
 
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar SeriesAmazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
Amazon Aurora for the Enterprise - August 2016 Monthly Webinar Series
 

Similar to AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Practices

(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSAmazon Web Services
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...1Strategy
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisAmazon Web Services
 

Similar to AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Practices (20)

Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Practices

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Arie Leeuwesteijn, Solutions Architect, AWS September 2016 Big Data Architectural Patterns and Best Practices on AWS
  • 2. Agenda Big data challenges How to simplify big data processing What technologies should you use? • Why? • How? Reference architecture Design patterns
  • 3. Ever Increasing Big Data Volume Velocity Variety
  • 5. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Kinesis-enabled app Lambda ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service
  • 6. Big Data Challenges Is there a reference architecture? What tools should I use? How? Why?
  • 7. Decoupled “data bus” • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Use Lambda architecture ideas • Immutable (append-only) log, batch/speed/serving layer Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Big data ≠ big cost Architectural Principles
  • 8. Simplify Big Data Processing COLLECT STORE PROCESS/ ANALYZE CONSUME Time to answer (Latency) Throughput Cost
  • 10. Types of Data Database records Search documents Log files Messaging events Devices / sensors / IoT stream Devices Sensors & IoT platforms AWS IoT STREAMS Stream storage IoT COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database Applications AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search File store LoggingTransport Messaging Message MESSAGES Queue Messaging
  • 11. Store
  • 12. Amazon Kinesis Firehose Amazon Kinesis Streams/Analytics Apache Kafka Amazon DynamoDB Streams Amazon SQS Amazon SQS • Managed message queue service Apache Kafka • High throughput distributed messaging system Amazon Kinesis Streams • Managed stream storage + processing Amazon Kinesis Analytics • Process streaming data in real time Amazon Kinesis Firehose • Managed data delivery Amazon DynamoDB • Managed NoSQL database • Tables can be stream-enabled Message & Stream Storage Devices Sensors & IoT platforms AWS IoT STREAMS IoT COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database Applications AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENT S FILES Search File store LoggingTransport Messaging Message MESSAGES Messaging Queue Stream
  • 13. Why Stream Storage? Decouple producers & consumers Persistent buffer Collect multiple streams Preserve client ordering Streaming MapReduce Parallel consumption 4 4 3 3 2 2 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 shard 1 / partition 1 shard 2 / partition 2 Consumer 1 Count of red = 4 Count of violet = 4 Consumer 2 Count of blue = 4 Count of green = 4 DynamoDB stream Amazon Kinesis stream Kafka topic
  • 14. What About Queues & Pub/Sub ? • Decouple producers & consumers/subscribers • Persistent buffer • Collect multiple streams • No client ordering • No parallel consumption for Amazon SQS • Amazon SNS can route to multiple queues or ʎ functions • No streaming MapReduce Consumers Producers Producers Amazon SNS Amazon SQS queue topic function ʎ AWS Lambda Amazon SQS queue Subscriber
  • 15. What Stream Storage should I use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS AWS managed service Yes Yes Yes No Yes Guaranteed ordering Yes Yes Yes Yes No Delivery exactly-once at-least-once exactly-once at-least-once at-least-once Data retention period 24 hours 7 days N/A Configurable 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic Parallel clients Yes Yes No Yes No Stream MapReduce Yes Yes N/A Yes N/A Record/object size 400 KB 1 MB Amazon Redshift row size Configurable 256 KB Cost Higher (table cost) Low Low Low (+admin) Low-medium Hot Warm
  • 16. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENT S FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon S3 Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging File Storage
  • 17. Why is Amazon S3 Good for Big Data? • Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • No need to run compute clusters for storage (unlike HDFS) • Can run transient Hadoop clusters & Amazon EC2 Spot Instances • Multiple distinct (Spark, Hive, Presto) clusters can use the same data • Unlimited number of objects • Very high bandwidth – no aggregate throughput limit • Highly available – can tolerate AZ failure • Designed for 99.999999999% durability • Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy • Secure – SSL, client/server-side encryption at rest • Low cost
  • 18. What about HDFS & Amazon Glacier? • Use HDFS for very frequently accessed (hot) data • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard – IA for infrequently accessed data • Use Amazon Glacier for archiving cold data
  • 19. Cache, database, search COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENT S FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon Elasticsearch Service Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS SearchSQLNoSQLCacheFile LoggingIoTApplicationsTransportMessaging
  • 21. Best Practice - Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch and CloudSearch Cache Amazon ElastiCache Redis Memcached SQL Amazon Aurora MySQL PostgreSQL Oracle SQL Server MariaDB NoSQL Amazon DynamoDB Cassandra HBase MongoDB Database tier
  • 23. What Data Store Should I Use? Data structure → Fixed schema, JSON, key-value Access patterns → Store data in the format you will access it Data / access characteristics → Hot, warm, cold Cost → Right cost
  • 24. Data Structure and Access Patterns Access Patterns What to use? Put/Get (key, value) Cache, NoSQL Simple relationships → 1:N, M:N NoSQL Cross table joins, transaction, SQL SQL Faceting, search, free text Search Data Structure What to use? Fixed schema SQL, NoSQL Schema-free (JSON) NoSQL, Search (Key, value) Cache, NoSQL
  • 25. What Is the Temperature of Your Data / Access ?
  • 26. Hot Warm Cold Volume MB–GB GB–TB PB Item size B–KB KB–MB KB–TB Latency ms ms, sec min, hrs Durability Low – high High Very high Request rate Very high High Low Cost/GB $$-$ $-¢¢ ¢ Hot data Warm data Cold data Data / Access Characteristics: Hot, Warm, Cold
  • 27. Cache SQL Request rate High Low Cost/GB High Low Latency Low High Data volume Low High Amazon Glacier Structure NoSQL Hot data Warm data Cold data Low High Search
  • 28. Amazon ElastiCache Amazon DynamoDB Amazon RDS/Aurora Amazon Elasticsearch Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec ms,sec,min (~ size) hrs Typical data stored GB GB–TBs (no limit) GB–TB (64 TB max) GB–TB MB–PB (no limit) GB–PB (no limit) Typical item size B-KB KB (400 KB max) KB (64 KB max) KB (2 GB max) KB-TB (5 TB max) GB (40 TB max) Request Rate High – very high Very high (no limit) High High Low – high (no limit) Very low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢/10 Durability Low - moderate Very high Very high High Very high Very high Availability High 2 AZ Very high 3 AZ Very high 3 AZ High 2 AZ Very high 3 AZ Very high 3 AZ Hot data Warm data Cold data What Data Store Should I Use?
  • 29. Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB? “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2048 1483 777,600,000
  • 30. Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB? https://calculator.s3.amazonaws.com/index.html Simple Monthly Calculator
  • 31. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2,048 1,483 777,600,000 Amazon S3 or Amazon DynamoDB?
  • 32. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month Scenario 1300 2,048 1,483 777,600,000 Scenario 2300 32,768 23,730 777,600,000 Amazon S3 Amazon DynamoDB use use
  • 33. COLLECT STORE PROCESS / ANALYZE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm SearchSQLNoSQLCacheFileMessage Stream Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENT S FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging Process / analyze
  • 35. Process / Analyze • Batch - Minutes or hours on cold data • Daily/weekly/monthly reports • Interactive – Seconds on warm/cold data • Self-service dashboards • Messaging – Milliseconds or seconds on hot data • Message/event buffering • Streaming - Milliseconds or seconds on hot data • Billing/fraud alerts, 1 minute metrics
  • 36. What Analytics Technology Should I Use? Amazon Redshift Amazon EMR Presto Spark Hive Query latency Low Low Low High Durability High High High High Data volume 1.6 PB max ~Nodes ~Nodes ~Nodes AWS managed Yes Yes Yes Yes Storage Native HDFS / S3 HDFS / S3 HDFS / S3 SQL compatibility High High Low (SparkSQL) Medium (HQL) Slow
  • 37. Spark Streaming Apache Storm AWS Lambda KCL apps Amazon Redshift Amazon Redshift Hive Spark Presto Amazon Kinesis Apache Kafka Amazon DynamoDB Amazon S3data Hot Cold Data temperature Processingspeed Fast Slow Answers Hive Native apps KCL apps AWS Lambda Data Temperature vs. Processing Speed
  • 38. • Replicate within, to, or from Amazon EC2 or RDS • Move data to the same or different database engine • Keep your apps running during the migration • Start your first migration in 10 minutes or less. AWS Database Migration Service
  • 39. • Migrate from Oracle, SQL Server and PostgreSQL • Converts your database schema including tables, stored procedures, views, and DML • Downloadable client install for Microsoft Windows, Mac and Linux desktops • Highlight where manual edits are needed AWS Schema Conversion Tool
  • 40. Predictions via Machine Learning ML gives computers the ability to learn without being explicitly programmed Machine learning algorithms: Supervised learning ← “teach” program - Classification ← Is this transaction fraud? (yes/no) - Regression ← Customer life-time value? Unsupervised learning ← Let it learn by itself - Clustering ← Market segmentation
  • 41. Tools and Frameworks Machine Learning • Amazon ML, Amazon EMR (Spark ML) Interactive • Amazon Redshift, Amazon EMR (Presto, Spark) Batch • Amazon EMR (MapReduce, Hive, Pig, Spark) Messaging • Amazon SQS application on Amazon EC2 Streaming • Micro-batch: Spark Streaming, KCL • Real-time: Amazon Kinesis Analytics, Storm, AWS Lambda, KCL Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift PROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR FastSlowFast BatchMessageInteractiveStreamML Amazon EC2 Amazon EC2
  • 42. What Streaming / Messaging Technology Should I Use? Spark Streaming Apache Storm Kinesis KCL Application AWS Lambda Amazon SQS Apps Scale ~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes Micro-batch or Real-time Micro-batch Real-time Near-real-time Near-real-time Near-real-time AWS managed service Yes (EMR) No (EC2) No (KCL + EC2 + Auto Scaling) Yes No (EC2 + Auto Scaling) Scalability No limits ~ nodes No limits ~ nodes No limits ~ nodes No limits No limits Availability Single AZ Configurable Multi-AZ Multi-AZ Multi-AZ Programming languages Java, Python, Scala Any language via Thrift Java, via MultiLang Daemon (.NET, Python, Ruby, Node.js) Node.js, Java, Python AWS SDK languages (Java, .NET, Python, …)
  • 44. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FastSlowFast BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileMessage Stream Amazon EC2 Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENT S FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging ETL
  • 46. STORE CONSUMEPROCESS / ANALYZE Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI Applications & API Analysis and visualization Notebooks IDE Business users Data scientist, developers COLLECT ETL
  • 47. Putting It All Together
  • 48. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FastSlowFast BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileQueueStream Amazon EC2 Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENT S FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI Reference architecture LoggingIoTApplicationsTransportMessaging ETL
  • 50. Common issues • Storage type • Does not match data type • Access patterns • Price/capacity, deletion of data • Processing models • Tight coupling • Over capacity • Producer bottleneck • Performance, throughput • Tooling
  • 51. Primitive: Multi-Stage Decoupled “Data Bus” Multiple stages Storage decouples multiple processing stages Store Process Store Process process store
  • 52. Primitive: Multiple Stream Processing Applications Can Read from Amazon Kinesis Amazon Kinesis AWS Lambda Amazon DynamoDB Amazon Kinesis S3 Connector process store Amazon S3
  • 53. Amazon EMR Primitive: Analysis Frameworks Could Read from Multiple Data Stores Amazon Kinesis AWS Lambda Amazon S3 Amazon DynamoDB Spark Streaming Amazon Kinesis S3 Connector process store Spark SQL
  • 54. Amazon EMR Real-time Analytics Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alert App state Real-time prediction KPI process store DynamoDB Streams Amazon Kinesis
  • 55. Message / Event Processing Amazon SQS Amazon SQS App Amazon SQS App Amazon SNS Subscribers Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Publish App state KPI process store Amazon SQS App Amazon SQS App Auto Scaling group Amazon SQSPriority queue Messages / events
  • 56. Interactive & Batch Analytics Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Spark Batch Interactive Batch prediction Real-time prediction Amazon Kinesis Firehose
  • 57. Batch layer Amazon Kinesis Data stream process store Amazon Kinesis Firehose Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed layer answer Serving layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Storm Lambda Architecture Spark Streaming on Amazon EMR
  • 59. Summary Decoupled “data bus” • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Use Lambda architecture ideas • Immutable (append-only) log, batch/speed/serving layer Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Big data ≠ big cost