SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building a Data Processing Pipeline
on AWS
Eng-Hwa Tan
Solutions Architect,
Amazon Web Services, ASEAN
• Big Data Challenges
• Architectural Principles
• Stages in a Data Processing Pipeline
• Build a data processing pipeline
• Design Patterns
Agenda
Ever Increasing Big Data
Volume
Velocity
Variety
Veracity
Value
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Lambda Amazon ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service
Amazon Kinesis
Analytics
Amazon Athena
Big Data Challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
Architectural Principles
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Leverage AWS managed services
• Scalable/elastic, available, reliable, secure, no/low admin
Use log-centric design patterns
• Immutable logs, materialized views
Be cost-conscious
• Big data ≠ big cost
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
Time to answer (Latency)
Throughput
Simplify Big Data Processing
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
Simplify Big Data Processing
Transactions
Connected
devices
Web logs /
cookies
Social media
ERP
Answers &
Insights
Data analysts
Data scientists
Business users
Engagements
Automation / events
Data
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT
Building a pipeline - DEMO
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
LoggingIoTApplicationsTransportMessaging
Streaming App
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message
Devices
Sensors &
IoT platforms
AWS IoT
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
MESSAGES
STREAMS
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
STREAMS
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
Weblogs – Common Log Format (CLF)
75.35.230.210 - - [20/Jul/2016:22:22:42 -0700]
"GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236
"http://www.swivel.com/graphs/show/1163466"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11)
Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Amazon Kinesis
Streams
• For Technical Developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift
and Amazon Elasticsearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver and process streams on AWS
Why Is Amazon S3 Good for Big Data?
• Unlimited number of objects and volume of data
• Very high bandwidth – no aggregate throughput limit
• Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
• No need to run compute clusters for storage (unlike HDFS)
• Multiple & heterogeneous analysis clusters can use the same data
• Designed for 99.99% availability – can tolerate zone failure
• Designed for 99.999999999% durability
• No need to pay for data replication
• Native support for versioning
• Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies
• Secure – SSL, client/server-side encryption at rest
• Low cost
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT:
Amazon Kinesis Firehose
Building a pipeline - DEMO
• Create Amazon S3 Bucket
• Create Amazon Kinesis Firehose delivery stream
• Publish logs to Amazon Kinesis Firehose
Demo
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE
COLLECT:
Amazon Kinesis Firehose
Building a pipeline - DEMO
• Check the files which were ingested into Amazon S3
• Clean the data using Amazon EMR (Spark)
• Create a table in Amazon Athena
• Query data using SQL
Demo
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
ETL
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
CONSUME
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Stream
Amazon SQS
Message
LoggingIoTApplicationsTransportMessaging
Streaming App
Amazon S3
FileSearch
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon ElastiCache
Amazon RDS
SQLNoSQLCache
PROCESS / ANALYZE
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
Stream
Amazon EC2
Amazon EMR
Amazon EC2
Amazon SQS apps
Message
Amazon
Machine Learning
ML
ETL
Amazon Redshift
Amazon
EMR
BatchInteractive
Presto
Amazon Athena
CONSUME
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Building a pipeline
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE:
Amazon Redshift and Amazon QuickSight
COLLECT:
Amazon Kinesis Firehose
• Amazon QuickSight Demo
Demo
Design Patterns
Primitive: Decoupled Data Bus
Storage decoupled from processing
Multiple stages
Store Process Store Process
process
store
Primitive: Pub/Sub
Amazon
Kinesis
AWS
Lambda
Amazon Kinesis
Connector
Library
process
store
Apache
Spark
Parallel stream consumption/processing
Amazon EMR
Real-time Analytics
Amazon
Kinesis
KCL app
AWS Lambda
Spark
Streaming
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alerts
App state
Real-time prediction
KPI
process
store
Amazon Kinesis
Analytics
Amazon
S3
Log
Amazon
KinesisFan out
Interactive
&
Batch
Analytics
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon Redshift
Amazon EMR
Presto
Spark
Batch
Interactive
Batch prediction
Real-time prediction
Amazon
Kinesis
Firehose
Amazon Athena
Files
Amazon Kinesis
Analytics
Interactive
&
Batch
Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
AWS Lambda
Storm
Spark Streaming
on Amazon EMR
Applications
Amazon
Kinesis
App state
KCL
Amazon
ML
Real-time
Amazon
DynamoDB
Amazon
RDS
Change Data
Capture
Transactions
Stream
Files
Data Lake
Amazon Kinesis
Analytics
Amazon
Athena
Amazon
Kinesis
Firehose
http://bit.ly/DataLakeOnAWS
Data Lake Solution Architecture on AWS
Thank You
http://blogs.aws.amazon.com/bigdata/

More Related Content

What's hot

GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
Amazon Web Services
 
Getting Started With Amazon Quick Sight
Getting Started With Amazon Quick SightGetting Started With Amazon Quick Sight
Getting Started With Amazon Quick Sight
Amazon Web Services
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
Amazon Web Services
 
Accelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAccelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform Services
Amazon Web Services
 
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Amazon Web Services
 
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliCasi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Amazon Web Services
 
Interconnect with Ecosystems and Things- AWS Summit SG 2017
Interconnect with Ecosystems and Things- AWS Summit SG 2017Interconnect with Ecosystems and Things- AWS Summit SG 2017
Interconnect with Ecosystems and Things- AWS Summit SG 2017
Amazon Web Services
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
Adrian Hornsby
 
Lessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete StanskiLessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete Stanski
Amazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
Amazon Web Services
 
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
Amazon Web Services
 
Going Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with BynderGoing Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with Bynder
Amazon Web Services
 
AWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris MunnsAWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris Munns
Amazon Web Services
 
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
Amazon Web Services
 
Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm
Amazon Web Services
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the Cloud
Adrian Hornsby
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Amazon Web Services
 
Keeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the CloudKeeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the Cloud
Amazon Web Services
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Amazon Web Services
 
AWS Summit London 2016 Keynote
AWS Summit London 2016 Keynote AWS Summit London 2016 Keynote
AWS Summit London 2016 Keynote
Amazon Web Services
 

What's hot (20)

GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
GitHub Enterprise and Automation with Codedeploy - AWS Summit SG 2017
 
Getting Started With Amazon Quick Sight
Getting Started With Amazon Quick SightGetting Started With Amazon Quick Sight
Getting Started With Amazon Quick Sight
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
Accelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAccelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform Services
 
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
 
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliCasi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
 
Interconnect with Ecosystems and Things- AWS Summit SG 2017
Interconnect with Ecosystems and Things- AWS Summit SG 2017Interconnect with Ecosystems and Things- AWS Summit SG 2017
Interconnect with Ecosystems and Things- AWS Summit SG 2017
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Lessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete StanskiLessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete Stanski
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016
 
Going Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with BynderGoing Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with Bynder
 
AWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris MunnsAWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris Munns
 
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
AWS Partner Presentation - Digicomp - AWSome Day Zurich 112016
 
Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the Cloud
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
Keeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the CloudKeeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the Cloud
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
AWS Summit London 2016 Keynote
AWS Summit London 2016 Keynote AWS Summit London 2016 Keynote
AWS Summit London 2016 Keynote
 

Similar to Building a Data Processing Pipeline on AWS - AWS Summit SG 2017

Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Amazon Web Services
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
Amazon Web Services
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
Kristana Kane
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Amazon Web Services
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
Amazon Web Services
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
Amazon Web Services
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Amazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Amazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
Amazon Web Services
 

Similar to Building a Data Processing Pipeline on AWS - AWS Summit SG 2017 (20)

Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Building a Data Processing Pipeline on AWS - AWS Summit SG 2017

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building a Data Processing Pipeline on AWS Eng-Hwa Tan Solutions Architect, Amazon Web Services, ASEAN
  • 2. • Big Data Challenges • Architectural Principles • Stages in a Data Processing Pipeline • Build a data processing pipeline • Design Patterns Agenda
  • 3. Ever Increasing Big Data Volume Velocity Variety Veracity Value
  • 4. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Lambda Amazon ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon Athena
  • 5. Big Data Challenges Why? How? What tools should I use? Is there a reference architecture?
  • 6. Architectural Principles Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable logs, materialized views Be cost-conscious • Big data ≠ big cost
  • 7. COLLECT STORE PROCESS/ ANALYZE CONSUME Time to answer (Latency) Throughput Simplify Big Data Processing
  • 8. COLLECT STORE PROCESS/ ANALYZE CONSUME Simplify Big Data Processing Transactions Connected devices Web logs / cookies Social media ERP Answers & Insights Data analysts Data scientists Business users Engagements Automation / events Data
  • 10. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS LoggingIoTApplicationsTransportMessaging Streaming App
  • 11. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS LoggingIoTApplicationsTransportMessaging Streaming App Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 12. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message Devices Sensors & IoT platforms AWS IoT LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache MESSAGES STREAMS
  • 13. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache STREAMS
  • 14. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 15. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache
  • 16. Weblogs – Common Log Format (CLF) 75.35.230.210 - - [20/Jul/2016:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236 "http://www.swivel.com/graphs/show/1163466" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
  • 17. Amazon Kinesis Streams • For Technical Developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver and process streams on AWS
  • 18. Why Is Amazon S3 Good for Big Data? • Unlimited number of objects and volume of data • Very high bandwidth – no aggregate throughput limit • Natively supported by big data frameworks (Spark, Hive, Presto, etc.) • No need to run compute clusters for storage (unlike HDFS) • Multiple & heterogeneous analysis clusters can use the same data • Designed for 99.99% availability – can tolerate zone failure • Designed for 99.999999999% durability • No need to pay for data replication • Native support for versioning • Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies • Secure – SSL, client/server-side encryption at rest • Low cost
  • 19. PROCESS STORE ANALYZE & VISUALIZE COLLECT: Amazon Kinesis Firehose Building a pipeline - DEMO
  • 20. • Create Amazon S3 Bucket • Create Amazon Kinesis Firehose delivery stream • Publish logs to Amazon Kinesis Firehose Demo
  • 21. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE
  • 22. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2
  • 23. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message
  • 24. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon EMR BatchInteractive Presto Amazon Athena
  • 25. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 26. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 27. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena
  • 28. PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE COLLECT: Amazon Kinesis Firehose Building a pipeline - DEMO
  • 29. • Check the files which were ingested into Amazon S3 • Clean the data using Amazon EMR (Spark) • Create a table in Amazon Athena • Query data using SQL Demo
  • 30. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML ETL Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena CONSUME
  • 31. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Stream Amazon SQS Message LoggingIoTApplicationsTransportMessaging Streaming App Amazon S3 FileSearch Amazon Elasticsearch Service Amazon DynamoDB Amazon ElastiCache Amazon RDS SQLNoSQLCache PROCESS / ANALYZE Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Amazon EC2 Amazon SQS apps Message Amazon Machine Learning ML ETL Amazon Redshift Amazon EMR BatchInteractive Presto Amazon Athena CONSUME Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI
  • 32. Building a pipeline PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight COLLECT: Amazon Kinesis Firehose
  • 35. Primitive: Decoupled Data Bus Storage decoupled from processing Multiple stages Store Process Store Process process store
  • 37. Amazon EMR Real-time Analytics Amazon Kinesis KCL app AWS Lambda Spark Streaming Amazon SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alerts App state Real-time prediction KPI process store Amazon Kinesis Analytics Amazon S3 Log Amazon KinesisFan out
  • 38. Interactive & Batch Analytics Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Spark Batch Interactive Batch prediction Real-time prediction Amazon Kinesis Firehose Amazon Athena Files Amazon Kinesis Analytics
  • 39. Interactive & Batch Amazon S3 Amazon Redshift Amazon EMR Presto Hive Pig Spark Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES AWS Lambda Storm Spark Streaming on Amazon EMR Applications Amazon Kinesis App state KCL Amazon ML Real-time Amazon DynamoDB Amazon RDS Change Data Capture Transactions Stream Files Data Lake Amazon Kinesis Analytics Amazon Athena Amazon Kinesis Firehose