SlideShare a Scribd company logo
1 of 41
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unni Pillai
Solutions Architect
unni_k_pillai
Building a Data Processing Pipeline
on AWS
• Big Data Challenges
• Architectural Principles
• Stages in a Data Processing Pipeline
• Demo : Build a data processing pipeline
• Design Patterns
Agenda
Ever Increasing Big Data
Volume
Velocity
Variety
Veracity
Value
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Lambda Amazon ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service
AmazonKinesis
Analytics
Amazon Athena
Big Data Challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
Architectural Principles
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable logs, materialized views
Be cost-conscious
• Big data ≠ big cost
COLLECT STORE PROCESS/
ANALYZE
CONSUME
Time to answer (Latency)
Throughput
Simplify Big Data Processing
COLLECT STORE PROCESS/
ANALYZE
CONSUME
Simplify Big Data Processing
Transactions
Connected
devices
Web logs /
cookies
Social media
ERP
Answers &
Insights
Data analysts
Data scientists
Business users
Engagements
Automation / events
Data
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT
Building a pipeline - DEMO
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
LoggingIoTApplicationsTransportMessaging
Streaming App
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message
Devices
Sensors &
IoT platforms
AWS IoT
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
MESSAGES
STREAMS
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
STREAMS
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
Weblogs – Common Log Format (CLF)
75.35.230.210 - - [20/Jul/2016:22:22:42 -0700]
"GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236
"http://www.swivel.com/graphs/show/1163466"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11)
Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Amazon Kinesis
Streams
• For Technical Developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift
and Amazon Elasticsearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver and process streams on AWS
Why Is Amazon S3 Good for Big Data?
• Unlimited number of objects and volume of data
• Very high bandwidth – no aggregate throughput limit
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• No need to run compute clusters for storage (unlike HDFS)
• Multiple & heterogeneous analysis clusters can use the same data
• Designed for 99.99% availability – can tolerate zone failure
• Designed for 99.999999999% durability
• No need to pay for data replication
• Native support for versioning
• Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies
• Secure – SSL, client/server-side encryption at rest
• Low cost
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT:
Amazon Kinesis Firehose
Building a pipeline - DEMO
• Create Amazon S3 Bucket
• Create Amazon Kinesis Firehose delivery stream
• Publish logs to Amazon Kinesis Firehose
Demo
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE
COLLECT:
Amazon Kinesis Firehose
Building a pipeline - DEMO
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
ETL
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
CONSUME
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
ETL
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
CONSUME
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Building a pipeline
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE:
Amazon Redshift and Amazon QuickSight
COLLECT:
Amazon Kinesis Firehose
• Check the files which were ingested into Amazon S3
• Clean the data using Amazon EMR (Spark)
• Create a table in Amazon Athena
• Query data using SQL
Demo
• Amazon QuickSight Demo
Demo
Design Patterns
Primitive: Decoupled Data Bus
Storage decoupled from processing
Multiple stages
Store Process Store Process
process
store
Primitive: Pub/Sub
Amazon
Kinesis
AWS
Lambda
Amazon Kinesis
Connector
Library
process
store
Apache
Spark
Parallel stream consumption/processing
Amazon EMR
Real-time Analytics
Amazon
Kinesis
KCL app
AWS Lambda
Spark
Streaming
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alerts
App state
Real-time prediction
KPI
process
store
Amazon Kinesis
Analytics
Amazon
S3
Log
Amazon
KinesisFan out
Interactive
&
Batch
Analytics
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon Redshift
Amazon EMR
Presto
Spark
Batch
Interactive
Batch prediction
Real-time prediction
Amazon
Kinesis
Firehose
Amazon Athena
Files
Amazon Kinesis
Analytics
Interactive
&
Batch
Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
AWS Lambda
Storm
Spark Streaming
on Amazon EMR
Applications
Amazon
Kinesis
App state
KCL
Amazon
ML
Real-time
Amazon
DynamoDB
Amazon
RDS
Change Data
Capture
Transactions
Stream
Files
Data Lake
Amazon Kinesis
Analytics
Amazon
Athena
Amazon
Kinesis
Firehose
http://bit.ly/DataLakeOnAWS
Data Lake Solution Architecture on AWS
Thank You
http://blogs.aws.amazon.com/bigdata/
unni_k_pillai

More Related Content

What's hot

AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWSsampath439572
 
Creating the Cloud Business Case
Creating the Cloud Business CaseCreating the Cloud Business Case
Creating the Cloud Business CaseAmazon Web Services
 
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...Amazon Web Services
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudAmazon Web Services
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journeyAmazon Web Services
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceAmazon Web Services
 
Introduction to EC2
Introduction to EC2Introduction to EC2
Introduction to EC2Mark Squires
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWSIan Massingham
 
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksDeep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksAmazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3Mark Cohen
 
Considerations for your Cloud Journey
Considerations for your Cloud JourneyConsiderations for your Cloud Journey
Considerations for your Cloud JourneyAmazon Web Services
 

What's hot (20)

AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Amazon quicksight
Amazon quicksightAmazon quicksight
Amazon quicksight
 
Creating the Cloud Business Case
Creating the Cloud Business CaseCreating the Cloud Business Case
Creating the Cloud Business Case
 
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...
Your Virtual Data Center: VPC Fundamentals and Connectivity Options (NET201) ...
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Migrating On-Premises Databases to Cloud
Migrating On-Premises Databases to CloudMigrating On-Premises Databases to Cloud
Migrating On-Premises Databases to Cloud
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journey
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
 
Introduction to EC2
Introduction to EC2Introduction to EC2
Introduction to EC2
 
SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
AWS Lambda Features and Uses
AWS Lambda Features and UsesAWS Lambda Features and Uses
AWS Lambda Features and Uses
 
AWS Security Best Practices
AWS Security Best PracticesAWS Security Best Practices
AWS Security Best Practices
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Cloud Migration: A How-To Guide
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
 
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksDeep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3
 
Considerations for your Cloud Journey
Considerations for your Cloud JourneyConsiderations for your Cloud Journey
Considerations for your Cloud Journey
 

Viewers also liked

Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Clemens Valiente
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수NAVER D2
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림NAVER D2
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준NAVER D2
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민NAVER D2
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민NAVER D2
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우NAVER D2
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Viewers also liked (17)

Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Similar to Building a Data Processing Pipeline on AWS

Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsKristana Kane
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAmazon Web Services
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAmazon Web Services
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014Amazon Web Services
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosAmazon Web Services LATAM
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 

Similar to Building a Data Processing Pipeline on AWS (20)

Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Building a Data Processing Pipeline on AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unni Pillai Solutions Architect unni_k_pillai Building a Data Processing Pipeline on AWS
  • 2. • Big Data Challenges • Architectural Principles • Stages in a Data Processing Pipeline • Demo : Build a data processing pipeline • Design Patterns Agenda
  • 3. Ever Increasing Big Data Volume Velocity Variety Veracity Value
  • 4. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Lambda Amazon ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service AmazonKinesis Analytics Amazon Athena
  • 5. Big Data Challenges Why? How? What tools should I use? Is there a reference architecture?
  • 6. Architectural Principles Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable logs, materialized views Be cost-conscious • Big data ≠ big cost
  • 7. COLLECT STORE PROCESS/ ANALYZE CONSUME Time to answer (Latency) Throughput Simplify Big Data Processing
  • 8. COLLECT STORE PROCESS/ ANALYZE CONSUME Simplify Big Data Processing Transactions Connected devices Web logs / cookies Social media ERP Answers & Insights Data analysts Data scientists Business users Engagements Automation / events Data
  • 10. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS LoggingIoTApplicationsTransportMessaging Streaming App
  • 11. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS LoggingIoTApplicationsTransportMessaging Streaming App Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 12. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message Devices Sensors & IoT platforms AWS IoT LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache MESSAGES STREAMS
  • 13. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache STREAMS
  • 14. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 15. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 16. Weblogs – Common Log Format (CLF) 75.35.230.210 - - [20/Jul/2016:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236 "http://www.swivel.com/graphs/show/1163466" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
  • 17. Amazon Kinesis Streams • For Technical Developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver and process streams on AWS
  • 18. Why Is Amazon S3 Good for Big Data? • Unlimited number of objects and volume of data • Very high bandwidth – no aggregate throughput limit • Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • No need to run compute clusters for storage (unlike HDFS) • Multiple & heterogeneous analysis clusters can use the same data • Designed for 99.99% availability – can tolerate zone failure • Designed for 99.999999999% durability • No need to pay for data replication • Native support for versioning • Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies • Secure – SSL, client/server-side encryption at rest • Low cost
  • 19. PROCESS STORE ANALYZE & VISUALIZE COLLECT: Amazon Kinesis Firehose Building a pipeline - DEMO
  • 20. • Create Amazon S3 Bucket • Create Amazon Kinesis Firehose delivery stream • Publish logs to Amazon Kinesis Firehose Demo
  • 21. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE
  • 22. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2
  • 23. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message
  • 24. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon EMR BatchInteractive Presto Amazon Athena
  • 25. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 26. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 27. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 28. PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE COLLECT: Amazon Kinesis Firehose Building a pipeline - DEMO
  • 29. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML ETL Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena CONSUME
  • 30. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML ETL Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena CONSUME Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI
  • 31. Building a pipeline PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight COLLECT: Amazon Kinesis Firehose
  • 32. • Check the files which were ingested into Amazon S3 • Clean the data using Amazon EMR (Spark) • Create a table in Amazon Athena • Query data using SQL Demo
  • 35. Primitive: Decoupled Data Bus Storage decoupled from processing Multiple stages Store Process Store Process process store
  • 37. Amazon EMR Real-time Analytics Amazon Kinesis KCL app AWS Lambda Spark Streaming Amazon SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alerts App state Real-time prediction KPI process store Amazon Kinesis Analytics Amazon S3 Log Amazon KinesisFan out
  • 38. Interactive & Batch Analytics Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Spark Batch Interactive Batch prediction Real-time prediction Amazon Kinesis Firehose Amazon Athena Files Amazon Kinesis Analytics
  • 39. Interactive & Batch Amazon S3 Amazon Redshift Amazon EMR Presto Hive Pig Spark Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES AWS Lambda Storm Spark Streaming on Amazon EMR Applications Amazon Kinesis App state KCL Amazon ML Real-time Amazon DynamoDB Amazon RDS Change Data Capture Transactions Stream Files Data Lake Amazon Kinesis Analytics Amazon Athena Amazon Kinesis Firehose