SlideShare a Scribd company logo
1 of 54
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Olivier Klein 奧樂凱
Solutions Architect, Greater China
April 2016
Real-time Analytics with
Open-Source and AWS
“We see our customers as
invited guests to a party,
and we are the hosts. It’s
our job to make every
important aspect of the
customer experience
a little bit better.”
Jeff Bezos
CEO, Amazon.com
Data analysis for a better customer experience
• Your business creates and stores
data and logs all the time
• Data points and logs allow you to
understand individual customer
experience and improve it
• Analysis of logs and trails help
gain insights
How does Open-Source fit into Data Analytics?
Most Notably: Apache Hadoop
• Open-Source Project for distributed
storage and distributed
processing of very large data sets
• Scales linearly on commodity
hardware compute nodes
• Has an entire ecosystem built
around it for various purposes
 Accumulo – cell-based access control NoSQL
 Avro – data serialization system
 Cascading – alternative language APIs on MR
 Cassandra – multi-master NoSQL DB
 Chukwa – data collection system at scale
 Flume – collecting, aggregating, moving logs
 Giraph – iterative graph processing system
 HBase – large table NoSQL DB
 HDFS – distributed file system
 Hive – SQL on MapReduce Data Warehouse
 Mahout – scalable machine learning library
 MapReduce – parallel processing on YARN
 Nutch – web crawler software
 Pig – high-level scripting on MapReduce
 R - statistical computing and graphics
 Spark – general compute engine on YARN
 Sqoop – transferring data to/from RDBMS
 Tez – data-flow programming on YARN
 Thrift – build scalable cross-language services
 ZooKeeper – coordination
Hadoop Ecosystem
Tell me more about Big Data!
Ever Increasing Amount of Data
Volume
Velocity
Variety
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
More devices
Lower cost
Higher throughput
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
Highly constrained
More devices
Lower cost
Higher throughput
Generation
Collection & Storage
Analytics & Computation
Collaboration & Sharing
Amazon Web Services helps remove constraints
Big Data:
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not a steady-state
workload; peaks and valleys
• Data is a combination of
structured and unstructured
data in many formats
AWS Cloud:
• Virtually unlimited capacity
• Iterative, experimental usage cost
through on-demand
infrastructure
• Fully scalable infrastructure for
highly variable workloads
• Tools & Services for managing
structured, unstructured and
stream data
Let’s simplify Big Data with AWS!
Three Types of Data Analytics
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
apps
Ingest Store Process Visualize
Data Answers
Time
Simplified Big Data Pipeline
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Fluentd: Open Source Log Collection
https://github.com/fluent/fluentd/
• Fluentd is an open source
data collector to unify data
collection and consumption
• Integration into many data
sources (App Logs, Syslogs,
Twitter etc.)
• Direct integration into AWS
such as S3 & Kinesis
<source>
type tail
format apache2
path /var/log/apache2/access_log
tag s3.apache.access
</source>
<match s3.*.*>
type s3
s3_bucket myweblogs
path logs/
</match>
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon S3
• Highly available object storage
• Designed for 99.999999999% annual
data durability
• Replicated across 3 facilities
• Virtually unlimited scale
• Pay only for what you use, you don’t
need to pre-provision
• Allows event notifications to trigger
further action
Amazon S3
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon DynamoDB
• Schemaless Data Model
• Seamless scalability
• No storage or throughput limits
• Consistent low latency performance
• High durability and availability
• Replicated across 3 facilities
DynamoDB
table
items
attributes
Fully Managed NoSQL Database Service
500,000 writes / second to their Amazon
DynamoDB tables
200 additional servers during Superbowl
0 additional servers right after
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Stream in Real Time: Amazon Kinesis
• Real-Time Data Processing over
large distributed streams
• Elastic capacity that scales to
millions of events per second
• React In real-time upon incoming
stream events
• Reliable stream storage
replicated across 3 facilities
Amazon Kinesis
Kinesis
for Real-
Time
AWS Labs – Open Source Code for AWS
• Code and Connectors used with
Amazon Kinesis and other AWS
services are Open-Source
• Available under Apache License 2.0
https://github.com/awslabs
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon Elasticsearch Service
• Powerful, real-time, distributed, open-
source search and analytics engine
built on Apache Lucene
• Full integration into AWS with IAM for
security, Cloudtrail for auditing and
CloudWatch for monitoring
• Fully managed cluster that scales for
data size and throughput
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon EMR
• Amazon EMR is a fully managed
Hadoop cluster
• Transient and long running clusters
• Direct integration into Amazon S3
and Amazon Kinesis
• Easy to scale and enable burstable
capacity
• Integration with AWS Spot Market
1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
Process – Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR
• FINRA regulates trading practices of
brokerage firms and exchange markets to
protect market integrity
• Market surveillance platform stores
30 billion market events every day
• Leverages Amazon S3 to store events
and allow analysts to interactively query
market dynamics using Amazon EMR
Hive & HBase clusters with increased
agility
Re-Architecting Compliance
Unlimited
Storage
Distributed
Computing
Interactive Market
Queries
Ensure
compliance
30 billion market
events
CREATE TABLE call_data_records (
start_time bigint,
end_time bigint,
phone_number STRING,
carrier STRING,
recorded_duration bigint,
calculated_duration bigint,
lat double,
long double
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","
STORED BY
'com.amazon.emr.kinesis.hive.KinesisStorageHandler'
TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");
Amazon EMR integration: Hive
Apache Spark
• Apache Spark is an in-memory
analytics cluster using RDD (Resilient
Distributed Dataset) for fast processing
• Faster than Map-Reduce due to
removal of shuffling phases to HDFS
• Apache Spark Streaming can read
directly from DynamoDB, S3 and a
Kinesis stream
Processing Amazon Kinesis streams
Amazon
Kinesis
EMR with
Spark Streaming
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(‘Big Data’))
.countByWindow(Seconds(5))
Counting tweets on a sliding window
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
React in Real-Time: AWS Lambda
• Run your code in the cloud, fully
managed and highly-available
• Triggered through API calls or
state changes in your setup (S3,
DynamoDB, SNS, Kinesis)
• Scales automatically to match
the incoming event rate
• Charged per 100ms execution
time
Amazon
Kinesis
Amazon Lambda
Amazon
S3
Amazon
DynamoDB
Amazon API
Gateway
Amazon
SNS
AWS Lambda
• Use AWS Lambda to clean and
massage incoming data
• Write code to load data sources
(S3, DynamoDB) automatically in your
data warehouse (e.g. Amazon Redshift)
• React in real-time to incoming events in
Amazon Kinesis
Amazon Lambda
Amazon Redshift
Amazon
Kinesis
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon Redshift
• Fully managed petabyte-scale data
warehouse
• Scalable amount of cluster nodes
• ODBC/JDBC connector for BI tools
using SQL
• Supports Amazon DynamoDB and
Amazon S3 to load data
• Less than a 10th of a cost of traditional
solutions
Amazon Redshift
Amazon Redshift – Use Case
• Web Log Analaysis at amazon.com
(Online Retail Business)
• Understand customer behavior
• Who’s browsing but not buying?
• Which products are winners?
• What sequence led to higher
customer conversion?
• Metrics
• Every day 2TB new data
• Largest table: 400TB
Amazon Redshift – Use Case
• Performance
• Scan 2.25 trillion rows of data in
14 minutes
• Load 5 billion rows of data in
10 minutes
• Comparison
• Hadoop (Pig) to Redshift from
2 days to 1 hour
• Oracle DB to Redshift from
90 hours to 8 hours
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon Quicksight
• Fast, cloud-powered, BI service for
1/10th the cost of old-guard BI software
• Connectors for files, third party platforms
and AWS services
• In-memory calculation engine (SPICE)
to accelerate analysis and visualization
• Supports other partner BI tools
• $9 per user per month
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS
Import/Export
Snowball
Amazon
Kinesis
Amazon Machine
Learning
Kibana: Open Source Visualization
https://github.com/elastic/kibana
• Kibana is an open-source
project of Elastic.IO to
visualize data in browser
• Uses Elasticsearch as
indexing engine (based on
Apache Lucene)
Let’s put it all together: Demo Time!
Amazon
Kinesis
Twitter Stream Amazon
Lambda
Demo: Live Twitter Feed Analysis
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Twitter Blog* - On a typical day (in 2013):
• More than 500 million Tweets sent
• Average 5,700 TPS
Amazon
Elasticsearch
Service
Thank you!
Olivier Klein 奧樂凱
Solutions Architect, Greater China

More Related Content

What's hot

AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...
AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...
AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...Amazon Web Services
 
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...Amazon Web Services
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...Amazon Web Services
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Amazon Web Services
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Amazon Web Services
 
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...Amazon Web Services
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
 
Going Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with BynderGoing Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with BynderAmazon Web Services
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017Amazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Amazon Web Services
 
AWS Customer Presentation - Angelbeat Princeton Seminar
AWS Customer Presentation -  Angelbeat Princeton SeminarAWS Customer Presentation -  Angelbeat Princeton Seminar
AWS Customer Presentation - Angelbeat Princeton SeminarAmazon Web Services
 
Accelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAccelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAmazon Web Services
 
Introduction to AWS for Android Developers
Introduction to AWS for Android DevelopersIntroduction to AWS for Android Developers
Introduction to AWS for Android DevelopersAmazon Web Services
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Ian Massingham
 
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Amazon Web Services
 
AWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris MunnsAWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris MunnsAmazon Web Services
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...
AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...
AWS re:Invent 2016: Unlocking the Four Seasons of Migrations and Operations: ...
 
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
Industry 4.0: come i servizi IoT e Big Data di AWS rendono Smart il Manufactu...
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...
Partner Solutions: Veritas Technologies - Unique Ways Veritas can Supercharge...
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
Going Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with BynderGoing Global with AWS: Customer Case Study with Bynder
Going Global with AWS: Customer Case Study with Bynder
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2017
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 
AWS Customer Presentation - Angelbeat Princeton Seminar
AWS Customer Presentation -  Angelbeat Princeton SeminarAWS Customer Presentation -  Angelbeat Princeton Seminar
AWS Customer Presentation - Angelbeat Princeton Seminar
 
Accelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform ServicesAccelerate your Cloud Success with Platform Services
Accelerate your Cloud Success with Platform Services
 
Introduction to AWS for Android Developers
Introduction to AWS for Android DevelopersIntroduction to AWS for Android Developers
Introduction to AWS for Android Developers
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015
 
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
 
AWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris MunnsAWS Innovate Montreal Keynote - by Chris Munns
AWS Innovate Montreal Keynote - by Chris Munns
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Earth Observation in the Cloud
Earth Observation in the CloudEarth Observation in the Cloud
Earth Observation in the Cloud
 

Viewers also liked

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfigyalisassoon
 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsSingleStore
 
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up Loft
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up LoftTurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up Loft
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up LoftAmazon Web Services
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero Amazon Web Services
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAmazon Web Services
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...Qubit
 
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...Amazon Web Services
 
TCS: Leveraging AWS for SAP on Oracle implementations
TCS: Leveraging AWS for SAP on Oracle implementationsTCS: Leveraging AWS for SAP on Oracle implementations
TCS: Leveraging AWS for SAP on Oracle implementationsAmazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
AWS May 2016 Webinar Series - AWS Services Overview
AWS May 2016 Webinar Series - AWS Services OverviewAWS May 2016 Webinar Series - AWS Services Overview
AWS May 2016 Webinar Series - AWS Services OverviewAmazon Web Services
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)Amazon Web Services
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
 

Viewers also liked (16)

Snowplow at Sigfig
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfig
 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time Analytics
 
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up Loft
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up LoftTurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up Loft
TurboCharge Your Continuous Delivery Pipeline with Containers - Pop-up Loft
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...
Ann Summers - In Pursuit of Perfect Personalisation - Festival of Marketing 2...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...
AWS Customer Presentation: Coca Cola Turkey migrates SAP ERP to AWS-SAPPHIRE ...
 
TCS: Leveraging AWS for SAP on Oracle implementations
TCS: Leveraging AWS for SAP on Oracle implementationsTCS: Leveraging AWS for SAP on Oracle implementations
TCS: Leveraging AWS for SAP on Oracle implementations
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
AWS May 2016 Webinar Series - AWS Services Overview
AWS May 2016 Webinar Series - AWS Services OverviewAWS May 2016 Webinar Series - AWS Services Overview
AWS May 2016 Webinar Series - AWS Services Overview
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
 

Similar to Real-time Analytics with Open-Source and AWS

Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석Amazon Web Services Korea
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoAmazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 

Similar to Real-time Analytics with Open-Source and AWS (20)

Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Real-time Analytics with Open-Source and AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein 奧樂凱 Solutions Architect, Greater China April 2016 Real-time Analytics with Open-Source and AWS
  • 2. “We see our customers as invited guests to a party, and we are the hosts. It’s our job to make every important aspect of the customer experience a little bit better.” Jeff Bezos CEO, Amazon.com
  • 3. Data analysis for a better customer experience • Your business creates and stores data and logs all the time • Data points and logs allow you to understand individual customer experience and improve it • Analysis of logs and trails help gain insights
  • 4. How does Open-Source fit into Data Analytics?
  • 5. Most Notably: Apache Hadoop • Open-Source Project for distributed storage and distributed processing of very large data sets • Scales linearly on commodity hardware compute nodes • Has an entire ecosystem built around it for various purposes
  • 6.  Accumulo – cell-based access control NoSQL  Avro – data serialization system  Cascading – alternative language APIs on MR  Cassandra – multi-master NoSQL DB  Chukwa – data collection system at scale  Flume – collecting, aggregating, moving logs  Giraph – iterative graph processing system  HBase – large table NoSQL DB  HDFS – distributed file system  Hive – SQL on MapReduce Data Warehouse  Mahout – scalable machine learning library  MapReduce – parallel processing on YARN  Nutch – web crawler software  Pig – high-level scripting on MapReduce  R - statistical computing and graphics  Spark – general compute engine on YARN  Sqoop – transferring data to/from RDBMS  Tez – data-flow programming on YARN  Thrift – build scalable cross-language services  ZooKeeper – coordination Hadoop Ecosystem
  • 7. Tell me more about Big Data!
  • 8. Ever Increasing Amount of Data Volume Velocity Variety
  • 9. Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 10. More devices Lower cost Higher throughput Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 11. Highly constrained More devices Lower cost Higher throughput Generation Collection & Storage Analytics & Computation Collaboration & Sharing
  • 12. Amazon Web Services helps remove constraints
  • 13. Big Data: • Potentially massive datasets • Iterative, experimental style of data manipulation and analysis • Frequently not a steady-state workload; peaks and valleys • Data is a combination of structured and unstructured data in many formats AWS Cloud: • Virtually unlimited capacity • Iterative, experimental usage cost through on-demand infrastructure • Fully scalable infrastructure for highly variable workloads • Tools & Services for managing structured, unstructured and stream data
  • 14. Let’s simplify Big Data with AWS!
  • 15. Three Types of Data Analytics Retrospective analysis and reporting Here-and-now real-time processing and dashboards Predictions to enable smart apps
  • 16. Ingest Store Process Visualize Data Answers Time Simplified Big Data Pipeline
  • 17. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 18. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 19. Fluentd: Open Source Log Collection https://github.com/fluent/fluentd/ • Fluentd is an open source data collector to unify data collection and consumption • Integration into many data sources (App Logs, Syslogs, Twitter etc.) • Direct integration into AWS such as S3 & Kinesis <source> type tail format apache2 path /var/log/apache2/access_log tag s3.apache.access </source> <match s3.*.*> type s3 s3_bucket myweblogs path logs/ </match>
  • 20. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 21. Amazon S3 • Highly available object storage • Designed for 99.999999999% annual data durability • Replicated across 3 facilities • Virtually unlimited scale • Pay only for what you use, you don’t need to pre-provision • Allows event notifications to trigger further action Amazon S3
  • 22. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 23. Amazon DynamoDB • Schemaless Data Model • Seamless scalability • No storage or throughput limits • Consistent low latency performance • High durability and availability • Replicated across 3 facilities DynamoDB table items attributes Fully Managed NoSQL Database Service
  • 24.
  • 25. 500,000 writes / second to their Amazon DynamoDB tables 200 additional servers during Superbowl 0 additional servers right after
  • 26. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 27. Stream in Real Time: Amazon Kinesis • Real-Time Data Processing over large distributed streams • Elastic capacity that scales to millions of events per second • React In real-time upon incoming stream events • Reliable stream storage replicated across 3 facilities Amazon Kinesis
  • 29. AWS Labs – Open Source Code for AWS • Code and Connectors used with Amazon Kinesis and other AWS services are Open-Source • Available under Apache License 2.0 https://github.com/awslabs
  • 30. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 31. Amazon Elasticsearch Service • Powerful, real-time, distributed, open- source search and analytics engine built on Apache Lucene • Full integration into AWS with IAM for security, Cloudtrail for auditing and CloudWatch for monitoring • Fully managed cluster that scales for data size and throughput
  • 32. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 33. Amazon EMR • Amazon EMR is a fully managed Hadoop cluster • Transient and long running clusters • Direct integration into Amazon S3 and Amazon Kinesis • Easy to scale and enable burstable capacity • Integration with AWS Spot Market
  • 34. 1 instance x 100 hours = 100 instances x 1 hour (and with Spot Pricing not only faster but also cheaper)
  • 35. Process – Amazon EMR • Amazon EMR supports all common Hadoop Frameworks such as: • Spark, Pig, Hive, Hue, Oozie … • Hbase, Presto, Impala … • Decouples storage from compute • Allows independent scaling • Direct Integration with DynamoDB and S3 Amazon S3Amazon DynamoDB Amazon EMR
  • 36. • FINRA regulates trading practices of brokerage firms and exchange markets to protect market integrity • Market surveillance platform stores 30 billion market events every day • Leverages Amazon S3 to store events and allow analysts to interactively query market dynamics using Amazon EMR Hive & HBase clusters with increased agility Re-Architecting Compliance Unlimited Storage Distributed Computing Interactive Market Queries Ensure compliance 30 billion market events
  • 37. CREATE TABLE call_data_records ( start_time bigint, end_time bigint, phone_number STRING, carrier STRING, recorded_duration bigint, calculated_duration bigint, lat double, long double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED BY 'com.amazon.emr.kinesis.hive.KinesisStorageHandler' TBLPROPERTIES("kinesis.stream.name"=”MyTestStream"); Amazon EMR integration: Hive
  • 38. Apache Spark • Apache Spark is an in-memory analytics cluster using RDD (Resilient Distributed Dataset) for fast processing • Faster than Map-Reduce due to removal of shuffling phases to HDFS • Apache Spark Streaming can read directly from DynamoDB, S3 and a Kinesis stream
  • 39. Processing Amazon Kinesis streams Amazon Kinesis EMR with Spark Streaming KinesisUtils.createStream(‘twitter-stream’) .filter(_.getText.contains(‘Big Data’)) .countByWindow(Seconds(5)) Counting tweets on a sliding window
  • 40. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 41. React in Real-Time: AWS Lambda • Run your code in the cloud, fully managed and highly-available • Triggered through API calls or state changes in your setup (S3, DynamoDB, SNS, Kinesis) • Scales automatically to match the incoming event rate • Charged per 100ms execution time Amazon Kinesis Amazon Lambda Amazon S3 Amazon DynamoDB Amazon API Gateway Amazon SNS
  • 42. AWS Lambda • Use AWS Lambda to clean and massage incoming data • Write code to load data sources (S3, DynamoDB) automatically in your data warehouse (e.g. Amazon Redshift) • React in real-time to incoming events in Amazon Kinesis Amazon Lambda Amazon Redshift Amazon Kinesis
  • 43. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 44. Amazon Redshift • Fully managed petabyte-scale data warehouse • Scalable amount of cluster nodes • ODBC/JDBC connector for BI tools using SQL • Supports Amazon DynamoDB and Amazon S3 to load data • Less than a 10th of a cost of traditional solutions Amazon Redshift
  • 45. Amazon Redshift – Use Case • Web Log Analaysis at amazon.com (Online Retail Business) • Understand customer behavior • Who’s browsing but not buying? • Which products are winners? • What sequence led to higher customer conversion? • Metrics • Every day 2TB new data • Largest table: 400TB
  • 46. Amazon Redshift – Use Case • Performance • Scan 2.25 trillion rows of data in 14 minutes • Load 5 billion rows of data in 10 minutes • Comparison • Hadoop (Pig) to Redshift from 2 days to 1 hour • Oracle DB to Redshift from 90 hours to 8 hours
  • 47. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis
  • 48. Amazon Quicksight • Fast, cloud-powered, BI service for 1/10th the cost of old-guard BI software • Connectors for files, third party platforms and AWS services • In-memory calculation engine (SPICE) to accelerate analysis and visualization • Supports other partner BI tools • $9 per user per month
  • 49.
  • 50. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/Export Snowball Amazon Kinesis Amazon Machine Learning
  • 51. Kibana: Open Source Visualization https://github.com/elastic/kibana • Kibana is an open-source project of Elastic.IO to visualize data in browser • Uses Elasticsearch as indexing engine (based on Apache Lucene)
  • 52. Let’s put it all together: Demo Time!
  • 53. Amazon Kinesis Twitter Stream Amazon Lambda Demo: Live Twitter Feed Analysis * https://blog.twitter.com/2013/new-tweets-per-second-record-and-how Twitter Blog* - On a typical day (in 2013): • More than 500 million Tweets sent • Average 5,700 TPS Amazon Elasticsearch Service
  • 54. Thank you! Olivier Klein 奧樂凱 Solutions Architect, Greater China