SlideShare a Scribd company logo
1 of 46
P U B L I C S E C T O R
S U M M I T
B RUSSE LS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
EverythingYou Need to KnowAbout
Big Data: FromArchitectural
Principles to the Best Practices
Manos Samatas
Big Data and Analytics Solutions
Architect
AWS
A N T 2 0 1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Whatto expectfromthesession
Big data challenges
Architectural principles
How to simplify big data processing
What technologies should you use?
Why?
How?
Reference architecture
Design patterns
Customer examples
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Typesof big data analytics
Batch/
interactive
Stream
processing
Machine
learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Delivery of big data services
Virtualized
Managed
services
Serverless/
clusterless/
containerized
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Plethoraof tools
Amazon Athena
Amazon
Elasticsearch
Service
AWS Glue Amazon Redshift
Amazon EMRAmazon Kinesis
Amazon
CloudSearch
Amazon Managed
Streaming for Kafka
AWS Lake FormationAmazon QuickSight Data PipelineAmazon Aurora
DynamoDBElastiCacheAmazon DocumentDB Amazon Neptune
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Big datachallenges
Why?
How?
What tools should I use?
Is there a reference architecture?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Architecturalprinciples
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage managed and serverless services
• Scalable/elastic, available, reliable, secure, no/low admin
Use event-journal design patterns
• Immutable datasets (data lake), materialized views
Be cost-conscious
• Big data ≠ big cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect Store Process/
Analyze
Consume
Time to answer (latency)
Throughput
Cost
Simplifybig dataprocessing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect
Typeof data
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Transactions
Data structures
Database records
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Datatransport&logging
Import/export
Files/objects
Log files
Media files
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
EventsData streams
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Events
Files
Transactions
Collect
Devices
Sensors
IoT platforms
AWS IoT STREAMS
IoT
Data streams
Log files
Media files
RECORDS
Applications
Data structures
Database records
Typeof data Store
NoSQL
In-memory
SQL
File/object
store
Stream
storage
Mobile apps
Web apps
Data centers AWS Direct
Connect
Applications
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Datatransport&logging
Import/export
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect
AWSIoT STREAMS
IoTDatatransport&logging
Mobile apps
Web apps
Data centers AWS Direct
Connect
RECORDS
Applications
Store
NoSQL
In-memory
SQL
File/object
store
Amazon Kinesis
Apache Kafka
• High throughput distributed
streaming platform
Amazon Kinesis
• Managed stream service
Amazon MSK - (in Preview)
• Managed Service for Kafka
Stream storage
Devices
Sensors
IoT platforms
AWS IoT
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
FILES
Import/export
Mobile apps
Web apps
Data centers AWS Direct
Connect
Amazon MSK
Apache Kafka
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Whichstream/messagestorageshouldIuse?
Amazon
Kinesis Data
Streams
Amazon
Kinesis Data
Firehose
Apache Kafka (on
Amazon Elastic
Compute Cloud
[Amazon EC2])
Amazon Simple
Queue Service
(Amazon SQS)
(Standard)
Amazon SQS
(FIFO)
AWS managed Yes Yes No Yes Yes
Guaranteed ordering Yes No Yes No Yes
Delivery (deduping) At least once At least once At least/At
most/exactly once
At least once Exactly once
Data retention period 7 days N/A Configurable 14 days 14 days
Availability Three AZ Three AZ Configurable Three AZ Three AZ
Scale/throughput No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
300 TPS /
queue
Multiple consumers Yes No Yes No No
Row/object size 1 MB Destination
row/object size
Configurable 256 KB 256 KB
Cost Low Low Low (+admin) Low-medium Low-medium
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon Kinesis
Collect Store
File
Hot
Stream
Mobile apps
Web apps
Devices
Sensors
IoT platforms
AWS IoT
Data centers AWS Direct
Connect
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
FILES
STREAMS
Datatransport&loggingIoTApplications
Import/export
NoSQL
In-memory
SQL
Amazon S3
Amazon Simple Storage Service
(Amazon S3)
• Managed object storage service built
to store and retrieve any amount of
data
File/object storage
Amazon MSK
Apache Kafka
Mobile apps
Web apps
Data centers AWS Direct
Connect
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Use Amazon S3 as the storage for your data lake
• Natively supported by big data frameworks (Spark, Hive, Presto, and others)
• Decouple storage and compute
• No need to run compute clusters for storage (unlike HDFS)
• Can run transient Amazon EMR clusters with Amazon EC2 Spot
Instances
• Multiple & heterogeneous analysis clusters and services can use the
same data
• Designed for 99.999999999% durability
• No need to pay for data replication within a region
• Secure – SSL, client/server-side encryption at rest
• Low cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Whatabout HDFS & datatiering?
• Use HDFS/local for working data sets
• (for example, iterative read on the same data sets)
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard–IA for less
frequently accessed data
• Use Amazon Glacier for archiving cold
data
Use S3 analytics to optimize tiering strategy
S3 analytics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
What about metadata?
• AWS Glue catalog
• Hive Metastore compliant
• Crawlers - Detect new data, schema, partitions
• Search - Metadata discovery
• Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum
compatible
• Lake Formation - Preview
• Hive Metastore (Presto, Spark, Hive, Pig)
• Can be hosted on Amazon RDS
Data
catalog Metastore Amazon RDS
AWS Glue
catalog AWS Glue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect Store
Amazon
DynamoDB
Amazon RDS
Amazon Aurora
File
Mobile apps
Web apps
Devices
Sensors
IoT platforms
AWS IoT
Data centers AWS Direct
Connect
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
FILES
STREAMS
LoggingIoTApplications
Amazon S3
Amazon
DocumentDB
Amazon ElastiCache
Amazon DAX
Import/export
SQLNoSQLCache
Amazon ElastiCache
• Managed Memcached or Redis service
Amazon DynamoDB Accelerator (DAX)
• Managed in-memory cache for DynamoDB
Amazon Neptune
• Managed graph DB
Amazon DynamoDB
• Managed key value/document DB
Amazon Relational Database Service
(Amazon RDS)
• Managed relational database service
Cache & database
Apache Kafka
Amazon MSK
Amazon Kinesis
Hot
Stream
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Bestpractice:Use therighttoolfor thejob
Data tier
Relational
Referential integrity
with strong
consistency,
transactions, and
hardened scale
Key-value
Low-latency, key-
based queries with
high throughput and
fast data ingestion
Document
Indexing and storing
of documents with
support for query on
any property
In-memory
Microsecond latency,
key-based queries,
specialized data
structures
Graph
Creating and
navigating relations
between data easily
and quickly
Complex query
support via SQL
Simple query
methods with filters
Simple query with
filters, projections
and aggregates
Simple query
methods with filters
Easily express queries
in terms of relations
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
WhichdatastoreshouldI use?
Ask yourself some questions
What is the data structure?
How will the data be accessed?
What is the temperature of the data?
What will the solution cost?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Whatis thedata structure?
Access patterns What to use?
Put/get (key, value) In-memory, NoSQL
Simple relationships → 1:N, M:N NoSQL
Multi-table joins, transaction, SQL SQL
Faceting, search Search
Graph traversal GraphDB
Data structure What to use?
Fixed schema SQL, NoSQL
Schema-free (JSON) NoSQL, search
Key/value In-memory, NoSQL
Graph GraphDB
Howwillthedatabeaccessed?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon
ElastiCache
Amazon
DynamoDB +
DAX
Amazon
Aurora
Amazon RDS Amazon
Elasticsearch
Service
Amazon
Neptune
Amazon S3 +
Amazon
Glacier
Use cases In memory
caching
K/V lookups,
document
store
OLTP,
transactional
OLTP,
transactional
Log analysis,
reverse
indexing
Graph File store
Performance Ultra high
request rate,
Ultra low
latency
Ultra high
request rate,
ultra low to
low latency
Very high
request rate,
low latency
High request
rate, low
latency
Medium
request rate,
low latency
Medium
request rate,
low latency
High
throughput
Shape K/V K/V and
document
Relational Relational Documents Node/edges Files
Size GB TB, PB (no
limits)
GB, mid TB GB, low TB GB, TB GB, mid TB GB, TB, PB,
EB (no limits)
Cost / GB $$ ¢¢ - $$ ¢¢ ¢¢ ¢¢ ¢¢ ¢- ¢4/10
Availability 2 AZ 3 AZ 3 AZ 2 AZ 1-2 AZ 3 AZ 3 AZ
VPC support Inside VPC VPC endpoint Inside VPC Inside VPC Outside or
inside VPC
Inside VPC VPC endpoint
Database characteristics
Hot data Warm data Cold data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Process/
analyze
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Process/analyze
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Interactive& batchanalytics
Amazon Elasticsearch Service
Managed service for Elasticsearch
Amazon Redshift & Amazon Redshift Spectrum
Managed data warehouse
Spectrum enables querying S3
Amazon Athena
Serverless interactive query service
Amazon EMR
Managed Hadoop framework for running Apache Spark, Flink,
Presto, Tez, Hive, Pig, HBase, and others
Presto
Amazon
EMR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stream/real-timeanalytics
Spark Streaming on Amazon EMR
Amazon Kinesis Data Analytics
• Managed service for running SQL on
streaming data
Amazon KCL
• Amazon Kinesis Client Library
AWS Lambda
• Run code serverless (without
provisioning or managing servers)
• Services such as S3 can publish events
to AWS Lambda
• AWS Lambda can pool event from a
Kinesis
Process/analyze
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Predictiveanalytics
Process/analyze
Developers
Data scientists
Deep learning
experts
Predictive
AmazonML
Which analytics should I use?
Process/analyze
Batch
Takes minutes to hours
Example: Daily/weekly/monthly reports
Amazon EMR (MapReduce, Hive, Pig, Spark)
Interactive
Takes seconds
Example: Self-service dashboards
Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)
Stream
Takes milliseconds to seconds
Example: Fraud alerts, one-minute metrics
Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics,
Amazon KCL, AWS Lambda, and others
Predictive
Takes milliseconds (real-time) to minutes (batch)
Example: Fraud detection, forecasting demand, speech recognition
Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon
Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon
Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe)
FastSlow
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Presto
Amazon
EMR
Predictive
AmazonML
KCL
Apps
AWS Lambda
Amazon Kinesis
Analytics
Stream
Streaming
Fast
Whichstreamprocessing technologyshould Iuse?
Amazon EMR
(Spark
Streaming)
KCL application Amazon Kinesis
analytics
AWS Lambda
Managed service Yes No (EC2 + Auto
Scaling)
Yes Yes
Serverless No No Yes Yes
Scale / throughput No limits /
~ nodes
No limits /
~ nodes
No Limits /
automatic
No limits /
automatic
Availability Single AZ Multi-AZ Multi-AZ Multi-AZ
Programming
languages
Java, Python,
Scala
Java, others through
MultiLangDaemon
ANSI SQL with
extensions
Node.js, Java, Python, .Net Core
Sliding window
functions
Build-in App needs to
implement
Built-in No
Reliability KCL and Spark
checkpoints
Managed by
Amazon KCL
Managed by
Amazon Kinesis
Data Analytics
Managed by AWS Lambda
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
WhichAnalytics Tool Should I Use?
Amazon Redshift Amazon Redshift
Spectrum
Amazon Athena Amazon EMR
Presto Spark Hive
Use case Optimized for data
warehousing
Query S3 data from
Amazon Redshift
Interactive queries
over S3 data
Interactive
query
General
purpose
Batch
Scale/throughput ~Nodes ~Nodes Automatic ~ Nodes
Managed service Yes Yes Yes, Serverless Yes
Storage Local storage Amazon S3 Amazon S3 Amazon S3, HDFS
Optimization Columnar storage,
data compression,
and zone maps
AVRO, PARQUET
TEXT, SEQ
RCFILE, ORC, etc.
AVRO, PARQUET
TEXT, SEQ
RCFILE, ORC, etc.
Framework dependent
Metadata Amazon Redshift
catalog
AWS Glue catalog AWS Glue catalog AWS Glue catalog or
Hive Meta-store
Auth/access controls IAM, users, groups,
and access controls
IAM, users, groups,
and access controls
IAM IAM, LDAP, & Kerberos
UDF support Yes (Scalar) Yes (Scalar) No Yes
Slow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Raw data stored in data lake
Preparation
Normaliz ed
Partitioned
Compres s ed
Storage optimiz ed
ELT/ETL
ELT/ETL: Preparing raw data for consumption
Data lake
on AWS
Raw immutable
DataSets
Curated
DataSets
Data catalog
WhichELT/ETL tool should Iuse?
AWS Glue ETL AWS Data
Pipeline
Amazon Data
Migration Service
(Amazon DMS)
Amazon EMR Apache NiFi Partner
solution
Use case Serverless ETL Data
workflow
service
Migrate databases
(and to/from data
lake)
Customize
developed
Hadoop/Spark
ETL
Automate the
flow of data
between
systems
Rich partner
ecosystem for
ETL
Scale/throughput ~DPUs ~Nodes
through EMR
cluster
EC2 instance type ~Nodes Self managed Self manager or
through partner
Managed service Clusterless Managed Managed EC2 on
your behalf
Managed EC2
on your behalf
Self managed
on Amazon
EMR or
Marketplace
Self manager or
through partner
Data sources S3, RDBMS,
Amazon
RedShift,
DynamoDB
S3, JDBC,
DynamoDB,
custom
RDBMS, data
warehouses,
Amazon S3*
(*limited)
Various
Hadoop/Spark
managed
Various through
rich processor
framework
Various
Skills needed Wizard for
simple mapping,
code snippets
for advanced
ETL
Wizard and
code
snippets
Wizard and
drag/drop
Hadoop/Spark
coding
NiFi processors
and some
coding
Self manager or
through partner
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Consume
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect Store ConsumeProcess/analyzeETL
Amazon
QuickSight
Analysis&visualizationDatasceince
AI Apps
Apps
SW engineers/
business
users
Data scientists/
data engineers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Putting it all together
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collect Store ConsumeProcess/analyze
Apache Kafka
Amazon MSK
Amazon Kinesis
Hot
SQLNoSQLCacheFileStream
Mobile apps
Web apps
Devices
Sensors
IoT platforms
AWS IoT
Data centers AWS Direct
Connect
Migration
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
FILES
STREAMS
Amazon
QuickSight
Analysis&visualizationDatasceince
DataTransport&LoggingIoTApplications
Amazon S3
Import/export
AI Apps
Apps
Model
Train/
Eval
Models
Deploy
ETL
FastSlow
Amazon Redshift
& Spectrum
Amazon Athena
BatchInteractive
Amazon ES
Presto
Amazon
EMR
Predictive
AmazonML
KCL
Apps
AWS Lambda
Amazon Kinesis
Stream
Streaming
Fast
Amazon
DynamoDB
Amazon RDS
Amazon Aurora
Amazon
Document DB
Amazon ElastiCache
Amazon DAX
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Design patterns
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Spark Streaming
AWS Lambda
KCL apps
Amazon
Redshift
Amazon
Redshift
Hive
Spark
Presto
Processingtechnology
FastSlow
Hive
Native apps
KCL apps
AWS Lambda
Amazon
Athena
Amazon Kinesis Amazon
DynamoDB/RDS
Amazon S3Data
Hot Cold
Data store
Streaming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Streaming analytics
Amazon EMR
KCL app
AWS Lambda
Spark
Streaming
Amazon
AI
Real-time prediction
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
App state or
Materialized
View
KPI
Process
Store
Amazon
Kinesis
Amazon Kinesis
Data Analytics
Amazon
SNS NotificationsAlerts
Amazon
S3
Log
Amazon
KinesisFan out Downstream
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Hearst’s serverless data pipeline
cosmopolitan.com
caranddriver.com
sfchronicle.com
elle.com
Ingestion proxy
(Node.js)
Serverless data
pipeline
Offline
analysis and
archive
Real-time
analysis
Amazon Kinesis
Data Streams
AWS LambdaAWS LambdaAWS Lambda Amazon DynamoDB Amazon API Gateway
Amazon Kinesis
Data Firehose
Amazon Simple Storage
Service (S3)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Interactive
&
batch
analytics
Process
Store
Batch
Interactive
Amazon EMR
Hive
Pig
Spark
Amazon S3
Files
Amazon
Kinesis Data
Firehose
Amazon Kinesis
Data Analytics
Amazon Redshift
Amazon ES
Consume
Amazon EMR
Presto
Spark
Amazon Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
FINRA:Migrating toAWS
Petabytes of data generated
on-premises, brought to AWS,
and stored in S3
Thousands of analytical
queries performed on Amazon
EMR and Amazon Redshift
Stringent security
requirements met by
leveraging VPC, VPN,
encryption at-rest and in-
transit, AWS CloudTrail, and
database auditing
Flexible
interactive
queries
Predefined
queries
Surveillance
analytics
Web applications
Analysts; regulators
AWS Cloud
VPC
VPC
VPC
Amazon EMR
Amazon EMR
Amazon Redshift
Amazon S3
Interactive
&
batch
Amazon S3
Amazon Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
AWS Lambda
Spark Streaming on
Amazon EMR
Applications
Amazon Kinesis
Streams
App state
or
materialized
view
KCL
Real-time
Amazon
DynamoDB
Amazon
RDS
Change data capture or
export
Transactions
Stream
Files
Data lake
Amazon Kinesis
Data Analytics
Amazon Athena
Amazon Kinesis
Data Firehose
Amazon ES
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Summary
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed and serverless services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable logs, data lake, materialized views
Be cost-conscious
• Big data ≠ big cost
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Manos Samatas
samatas@amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T

More Related Content

What's hot

Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWSAmazon Web Services Korea
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSAmazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAmazon Web Services
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...Amazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 

What's hot (20)

Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS
있는 그대로 저장하고, 바로 분석 가능한, 새로운 관점의 데이터 애널리틱 플랫폼 - 정세웅 애널리틱 스페셜리스트, AWS
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWS
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
AWS Governance Overview - Beach
AWS Governance Overview - BeachAWS Governance Overview - Beach
AWS Governance Overview - Beach
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 

Similar to Everything You Need to Know About Big Data: From Architectural Principles to Best Practices

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakejavier ramirez
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCAmazon Web Services LATAM
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 

Similar to Everything You Need to Know About Big Data: From Architectural Principles to Best Practices (20)

AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Everything You Need to Know About Big Data: From Architectural Principles to Best Practices

  • 1. P U B L I C S E C T O R S U M M I T B RUSSE LS
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T EverythingYou Need to KnowAbout Big Data: FromArchitectural Principles to the Best Practices Manos Samatas Big Data and Analytics Solutions Architect AWS A N T 2 0 1
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Whatto expectfromthesession Big data challenges Architectural principles How to simplify big data processing What technologies should you use? Why? How? Reference architecture Design patterns Customer examples
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Typesof big data analytics Batch/ interactive Stream processing Machine learning
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Delivery of big data services Virtualized Managed services Serverless/ clusterless/ containerized
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Plethoraof tools Amazon Athena Amazon Elasticsearch Service AWS Glue Amazon Redshift Amazon EMRAmazon Kinesis Amazon CloudSearch Amazon Managed Streaming for Kafka AWS Lake FormationAmazon QuickSight Data PipelineAmazon Aurora DynamoDBElastiCacheAmazon DocumentDB Amazon Neptune
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Big datachallenges Why? How? What tools should I use? Is there a reference architecture?
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Architecturalprinciples Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage managed and serverless services • Scalable/elastic, available, reliable, secure, no/low admin Use event-journal design patterns • Immutable datasets (data lake), materialized views Be cost-conscious • Big data ≠ big cost
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect Store Process/ Analyze Consume Time to answer (latency) Throughput Cost Simplifybig dataprocessing
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect Typeof data Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications Transactions Data structures Database records Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Datatransport&logging Import/export Files/objects Log files Media files Devices Sensors IoT platforms AWS IoT STREAMS IoT EventsData streams
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Events Files Transactions Collect Devices Sensors IoT platforms AWS IoT STREAMS IoT Data streams Log files Media files RECORDS Applications Data structures Database records Typeof data Store NoSQL In-memory SQL File/object store Stream storage Mobile apps Web apps Data centers AWS Direct Connect Applications Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Datatransport&logging Import/export
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect AWSIoT STREAMS IoTDatatransport&logging Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications Store NoSQL In-memory SQL File/object store Amazon Kinesis Apache Kafka • High throughput distributed streaming platform Amazon Kinesis • Managed stream service Amazon MSK - (in Preview) • Managed Service for Kafka Stream storage Devices Sensors IoT platforms AWS IoT Migration Snowball Logging Amazon CloudWatch AWS CloudTrail FILES Import/export Mobile apps Web apps Data centers AWS Direct Connect Amazon MSK Apache Kafka
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Whichstream/messagestorageshouldIuse? Amazon Kinesis Data Streams Amazon Kinesis Data Firehose Apache Kafka (on Amazon Elastic Compute Cloud [Amazon EC2]) Amazon Simple Queue Service (Amazon SQS) (Standard) Amazon SQS (FIFO) AWS managed Yes Yes No Yes Yes Guaranteed ordering Yes No Yes No Yes Delivery (deduping) At least once At least once At least/At most/exactly once At least once Exactly once Data retention period 7 days N/A Configurable 14 days 14 days Availability Three AZ Three AZ Configurable Three AZ Three AZ Scale/throughput No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic 300 TPS / queue Multiple consumers Yes No Yes No No Row/object size 1 MB Destination row/object size Configurable 256 KB 256 KB Cost Low Low Low (+admin) Low-medium Low-medium
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Amazon Kinesis Collect Store File Hot Stream Mobile apps Web apps Devices Sensors IoT platforms AWS IoT Data centers AWS Direct Connect Migration Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS FILES STREAMS Datatransport&loggingIoTApplications Import/export NoSQL In-memory SQL Amazon S3 Amazon Simple Storage Service (Amazon S3) • Managed object storage service built to store and retrieve any amount of data File/object storage Amazon MSK Apache Kafka Mobile apps Web apps Data centers AWS Direct Connect
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Use Amazon S3 as the storage for your data lake • Natively supported by big data frameworks (Spark, Hive, Presto, and others) • Decouple storage and compute • No need to run compute clusters for storage (unlike HDFS) • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Multiple & heterogeneous analysis clusters and services can use the same data • Designed for 99.999999999% durability • No need to pay for data replication within a region • Secure – SSL, client/server-side encryption at rest • Low cost
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Whatabout HDFS & datatiering? • Use HDFS/local for working data sets • (for example, iterative read on the same data sets) • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard–IA for less frequently accessed data • Use Amazon Glacier for archiving cold data Use S3 analytics to optimize tiering strategy S3 analytics
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T What about metadata? • AWS Glue catalog • Hive Metastore compliant • Crawlers - Detect new data, schema, partitions • Search - Metadata discovery • Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum compatible • Lake Formation - Preview • Hive Metastore (Presto, Spark, Hive, Pig) • Can be hosted on Amazon RDS Data catalog Metastore Amazon RDS AWS Glue catalog AWS Glue
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect Store Amazon DynamoDB Amazon RDS Amazon Aurora File Mobile apps Web apps Devices Sensors IoT platforms AWS IoT Data centers AWS Direct Connect Migration Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS FILES STREAMS LoggingIoTApplications Amazon S3 Amazon DocumentDB Amazon ElastiCache Amazon DAX Import/export SQLNoSQLCache Amazon ElastiCache • Managed Memcached or Redis service Amazon DynamoDB Accelerator (DAX) • Managed in-memory cache for DynamoDB Amazon Neptune • Managed graph DB Amazon DynamoDB • Managed key value/document DB Amazon Relational Database Service (Amazon RDS) • Managed relational database service Cache & database Apache Kafka Amazon MSK Amazon Kinesis Hot Stream
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Bestpractice:Use therighttoolfor thejob Data tier Relational Referential integrity with strong consistency, transactions, and hardened scale Key-value Low-latency, key- based queries with high throughput and fast data ingestion Document Indexing and storing of documents with support for query on any property In-memory Microsecond latency, key-based queries, specialized data structures Graph Creating and navigating relations between data easily and quickly Complex query support via SQL Simple query methods with filters Simple query with filters, projections and aggregates Simple query methods with filters Easily express queries in terms of relations
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T WhichdatastoreshouldI use? Ask yourself some questions What is the data structure? How will the data be accessed? What is the temperature of the data? What will the solution cost?
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Whatis thedata structure? Access patterns What to use? Put/get (key, value) In-memory, NoSQL Simple relationships → 1:N, M:N NoSQL Multi-table joins, transaction, SQL SQL Faceting, search Search Graph traversal GraphDB Data structure What to use? Fixed schema SQL, NoSQL Schema-free (JSON) NoSQL, search Key/value In-memory, NoSQL Graph GraphDB Howwillthedatabeaccessed?
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Amazon ElastiCache Amazon DynamoDB + DAX Amazon Aurora Amazon RDS Amazon Elasticsearch Service Amazon Neptune Amazon S3 + Amazon Glacier Use cases In memory caching K/V lookups, document store OLTP, transactional OLTP, transactional Log analysis, reverse indexing Graph File store Performance Ultra high request rate, Ultra low latency Ultra high request rate, ultra low to low latency Very high request rate, low latency High request rate, low latency Medium request rate, low latency Medium request rate, low latency High throughput Shape K/V K/V and document Relational Relational Documents Node/edges Files Size GB TB, PB (no limits) GB, mid TB GB, low TB GB, TB GB, mid TB GB, TB, PB, EB (no limits) Cost / GB $$ ¢¢ - $$ ¢¢ ¢¢ ¢¢ ¢¢ ¢- ¢4/10 Availability 2 AZ 3 AZ 3 AZ 2 AZ 1-2 AZ 3 AZ 3 AZ VPC support Inside VPC VPC endpoint Inside VPC Inside VPC Outside or inside VPC Inside VPC VPC endpoint Database characteristics Hot data Warm data Cold data
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Process/ analyze
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Process/analyze Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Interactive& batchanalytics Amazon Elasticsearch Service Managed service for Elasticsearch Amazon Redshift & Amazon Redshift Spectrum Managed data warehouse Spectrum enables querying S3 Amazon Athena Serverless interactive query service Amazon EMR Managed Hadoop framework for running Apache Spark, Flink, Presto, Tez, Hive, Pig, HBase, and others Presto Amazon EMR
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stream/real-timeanalytics Spark Streaming on Amazon EMR Amazon Kinesis Data Analytics • Managed service for running SQL on streaming data Amazon KCL • Amazon Kinesis Client Library AWS Lambda • Run code serverless (without provisioning or managing servers) • Services such as S3 can publish events to AWS Lambda • AWS Lambda can pool event from a Kinesis Process/analyze KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Predictiveanalytics Process/analyze Developers Data scientists Deep learning experts Predictive AmazonML
  • 28. Which analytics should I use? Process/analyze Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Stream Takes milliseconds to seconds Example: Fraud alerts, one-minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Data Analytics, Amazon KCL, AWS Lambda, and others Predictive Takes milliseconds (real-time) to minutes (batch) Example: Fraud detection, forecasting demand, speech recognition Amazon SageMaker, Amazon Polly, Amazon Rekognition, Amazon Transcribe, Amazon Translate, Amazon EMR (Spark ML), Amazon Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK, and Caffe) FastSlow Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Presto Amazon EMR Predictive AmazonML KCL Apps AWS Lambda Amazon Kinesis Analytics Stream Streaming Fast
  • 29. Whichstreamprocessing technologyshould Iuse? Amazon EMR (Spark Streaming) KCL application Amazon Kinesis analytics AWS Lambda Managed service Yes No (EC2 + Auto Scaling) Yes Yes Serverless No No Yes Yes Scale / throughput No limits / ~ nodes No limits / ~ nodes No Limits / automatic No limits / automatic Availability Single AZ Multi-AZ Multi-AZ Multi-AZ Programming languages Java, Python, Scala Java, others through MultiLangDaemon ANSI SQL with extensions Node.js, Java, Python, .Net Core Sliding window functions Build-in App needs to implement Built-in No Reliability KCL and Spark checkpoints Managed by Amazon KCL Managed by Amazon Kinesis Data Analytics Managed by AWS Lambda
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T WhichAnalytics Tool Should I Use? Amazon Redshift Amazon Redshift Spectrum Amazon Athena Amazon EMR Presto Spark Hive Use case Optimized for data warehousing Query S3 data from Amazon Redshift Interactive queries over S3 data Interactive query General purpose Batch Scale/throughput ~Nodes ~Nodes Automatic ~ Nodes Managed service Yes Yes Yes, Serverless Yes Storage Local storage Amazon S3 Amazon S3 Amazon S3, HDFS Optimization Columnar storage, data compression, and zone maps AVRO, PARQUET TEXT, SEQ RCFILE, ORC, etc. AVRO, PARQUET TEXT, SEQ RCFILE, ORC, etc. Framework dependent Metadata Amazon Redshift catalog AWS Glue catalog AWS Glue catalog AWS Glue catalog or Hive Meta-store Auth/access controls IAM, users, groups, and access controls IAM, users, groups, and access controls IAM IAM, LDAP, & Kerberos UDF support Yes (Scalar) Yes (Scalar) No Yes Slow
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Raw data stored in data lake Preparation Normaliz ed Partitioned Compres s ed Storage optimiz ed ELT/ETL ELT/ETL: Preparing raw data for consumption Data lake on AWS Raw immutable DataSets Curated DataSets Data catalog
  • 32. WhichELT/ETL tool should Iuse? AWS Glue ETL AWS Data Pipeline Amazon Data Migration Service (Amazon DMS) Amazon EMR Apache NiFi Partner solution Use case Serverless ETL Data workflow service Migrate databases (and to/from data lake) Customize developed Hadoop/Spark ETL Automate the flow of data between systems Rich partner ecosystem for ETL Scale/throughput ~DPUs ~Nodes through EMR cluster EC2 instance type ~Nodes Self managed Self manager or through partner Managed service Clusterless Managed Managed EC2 on your behalf Managed EC2 on your behalf Self managed on Amazon EMR or Marketplace Self manager or through partner Data sources S3, RDBMS, Amazon RedShift, DynamoDB S3, JDBC, DynamoDB, custom RDBMS, data warehouses, Amazon S3* (*limited) Various Hadoop/Spark managed Various through rich processor framework Various Skills needed Wizard for simple mapping, code snippets for advanced ETL Wizard and code snippets Wizard and drag/drop Hadoop/Spark coding NiFi processors and some coding Self manager or through partner
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Consume
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect Store ConsumeProcess/analyzeETL Amazon QuickSight Analysis&visualizationDatasceince AI Apps Apps SW engineers/ business users Data scientists/ data engineers
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Putting it all together
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Collect Store ConsumeProcess/analyze Apache Kafka Amazon MSK Amazon Kinesis Hot SQLNoSQLCacheFileStream Mobile apps Web apps Devices Sensors IoT platforms AWS IoT Data centers AWS Direct Connect Migration Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS FILES STREAMS Amazon QuickSight Analysis&visualizationDatasceince DataTransport&LoggingIoTApplications Amazon S3 Import/export AI Apps Apps Model Train/ Eval Models Deploy ETL FastSlow Amazon Redshift & Spectrum Amazon Athena BatchInteractive Amazon ES Presto Amazon EMR Predictive AmazonML KCL Apps AWS Lambda Amazon Kinesis Stream Streaming Fast Amazon DynamoDB Amazon RDS Amazon Aurora Amazon Document DB Amazon ElastiCache Amazon DAX
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Design patterns
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Spark Streaming AWS Lambda KCL apps Amazon Redshift Amazon Redshift Hive Spark Presto Processingtechnology FastSlow Hive Native apps KCL apps AWS Lambda Amazon Athena Amazon Kinesis Amazon DynamoDB/RDS Amazon S3Data Hot Cold Data store Streaming
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Streaming analytics Amazon EMR KCL app AWS Lambda Spark Streaming Amazon AI Real-time prediction Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES App state or Materialized View KPI Process Store Amazon Kinesis Amazon Kinesis Data Analytics Amazon SNS NotificationsAlerts Amazon S3 Log Amazon KinesisFan out Downstream
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Hearst’s serverless data pipeline cosmopolitan.com caranddriver.com sfchronicle.com elle.com Ingestion proxy (Node.js) Serverless data pipeline Offline analysis and archive Real-time analysis Amazon Kinesis Data Streams AWS LambdaAWS LambdaAWS Lambda Amazon DynamoDB Amazon API Gateway Amazon Kinesis Data Firehose Amazon Simple Storage Service (S3)
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Interactive & batch analytics Process Store Batch Interactive Amazon EMR Hive Pig Spark Amazon S3 Files Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics Amazon Redshift Amazon ES Consume Amazon EMR Presto Spark Amazon Athena
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T FINRA:Migrating toAWS Petabytes of data generated on-premises, brought to AWS, and stored in S3 Thousands of analytical queries performed on Amazon EMR and Amazon Redshift Stringent security requirements met by leveraging VPC, VPN, encryption at-rest and in- transit, AWS CloudTrail, and database auditing Flexible interactive queries Predefined queries Surveillance analytics Web applications Analysts; regulators AWS Cloud VPC VPC VPC Amazon EMR Amazon EMR Amazon Redshift Amazon S3
  • 43. Interactive & batch Amazon S3 Amazon Redshift Amazon EMR Presto Hive Pig Spark Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES AWS Lambda Spark Streaming on Amazon EMR Applications Amazon Kinesis Streams App state or materialized view KCL Real-time Amazon DynamoDB Amazon RDS Change data capture or export Transactions Stream Files Data lake Amazon Kinesis Data Analytics Amazon Athena Amazon Kinesis Data Firehose Amazon ES
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Summary Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed and serverless services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable logs, data lake, materialized views Be cost-conscious • Big data ≠ big cost
  • 45. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Manos Samatas samatas@amazon.com
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T