2. Timely decisions require new data and fast
Source: Perishable insights, Mike Gualtieri, Forrester
Data loses value quickly over time
Real time Seconds Minutes Hours Days Months
Valueofdatatodecision-making
Preventive/Predictive
Actionable Reactive Historical
Time critical
decisions
Traditional “batch” business
intelligence
Information half-life
in decision-making
3. What is streaming data?
Typical characteristics
Low-latencyContinuous Ordered,
incremental
High volume
4. Most common uses of streaming
Industrial
Automation
Smart Home
Smart City
Data
Lakes
IoT
Analytics
Log
Analytics
8. 8
Slow batch driven
processing
Complexity in connecting to
external sources
THE PROBLEM: OUR CHALLENGES ON PROCESSING EVENT DATA
Events: Interactions (touching points) of (potential) customers
towards ABN AMRO, throughout devices, across channels
Error prone process: bad
records
Huge increase in volumes of
data
Fast changing sources
Limitations in consuming
capabilities
Diversity in data from
different sources
CUSTOMER EVENT STORE: WHY
9. 9
CUSTOMER EVENT STORE: WHY
PROBLEM STATEMENT
“Not being able to handle important events in the life of the customer that impact
their relation with ABN AMRO in an adequate way.”
Bernard Faber
Solution Architect, ABN AMRO
“Strong increase of the digitalized touching points with our customers (called
events), from a growing number of sources.”
Charles Van Kints
Product Owner, ABN AMRO
“The continuous growth in event data sources and volume, the increasing
demand towards using event data and the current solution within the Marketing
Intelligence data warehouse.”
Peter Kromhout
Engineering Lead, ABN AMRO
10. 10
KEY FEATURES
Handle Changes
Instantly
& Metadata Driven
Large Volumes
Consuming Capabilities
Customer Interactions
Real – Time
Future State
Customer
Event Store
Building insights in the customer behaviour, customer journey and customer interactions with ABN AMRO in order
to be able to act Personal and Relevant.
CUSTOMER EVENT STORE : WHAT
11. 11
JOURNEY SO FAR…
ü Approval – License to
Public
ü Prepare for Go-Live
March – 2018
Prototype
ü Develop Prototype.
ü Initiate License to Public
April – 2018
Technical Go - Live
ü Product Stack deployed
ü 2 Sources Live
ü Tune product for
Business Go-Live
Business Go – Live
ü Add new sources
ü Consuming Capabilities
ü Enable data usage
Approach
ü Successful prototype
ü Co-creation – Business
& IT
ü 2 Event Sources
Go!
August – 2018 September – 2018 December – 2018
CUSTOMER EVENT STORE: WHEN
14. 14
CUSTOMER EVENT STORE: HOW
TECHNICAL ARCHITECTURE: ONE PROCESS – STREAM & BATCH
Nano - Batch
Bucket
Auto-Scaling
Group
Snowplow
Collector
Fargate
Auto-Scaling
Group
Snowplow
Enricher
Fargate
Kinesis Data
Stream – Raw
Kinesis Data
Stream – Good
Kinesis Data
Stream – Bad
Kinesis Data
Firehose
Schema Bucket
Bad Events
Bucket
EnterpriseRaw
DataStore
Batch Bucket
15. 15
CUSTOMER EVENT STORE: HOW
TECHNICAL ARCHITECTURE: STANDARDIZEEnterpriseRaw
DataStore
Auto-Scaling
Group
Snowplow
Collector
Fargate
Auto-Scaling
Group
Snowplow
Enricher
Fargate
Kinesis Data
Stream – Raw
Kinesis Data
Stream – Good
Kinesis Data
Stream – Bad
CloudWatch
Kinesis Data
Stream –
Standardized
Kinesis Data
Firehose - ORC
Kinesis Data
Firehose - JSON
Standard Bucket
Glue Crawler
Athena
DynamoDB
Alarm
Alarm
Rule
Schema Bucket
17. 17
CUSTOMER EVENT STORE: SUMMARY
AWS STEP FUNCTIONS
2018
Analysis
Complex workflows involving
iteration of Lambda functions
can be implemented quickly.
Complex Workflows
Clear intermediate results.
Debug Friendly
New state machine can be
created for only the failed
states..
Restart – ability
Preserves state between
subsequent API calls.
State Management
Lambda, Glue, ECS,
SageMaker.
Serverless Orchestration
Retrials can be triggered for
specific errors. Other actions
can also be configured.
Error Handling
18. 18
CUSTOMER EVENT STORE: SUMMARY
WHEN TO USE GLUE AND/OR EMR
Can be placed in
custom VPC
Horizontally scalableServer less
&
Pre-configured
Limited
customization
Fully Managed
Public Service
Define Cluster –
Choose Applications
&
Customize as you wish
Vertically & horizontally
scalable
More actions than
just SPARK
AWS EMR
AWS Glue
Spin-up time
Only Spark
19. 19
KEY TAKE-AWAY
Dev-Ops
Security by Design
Architecture by Evolution
One Process
Serverless & Native
Components
Technical Vs Business
Go – Live
CUSTOMER EVENT STORE: SUMMARY
21. Streaming with Amazon Kinesis
Easily collect, process, and analyze video and data streams in real-time
Capture, process, and
store video streams
Amazon Kinesis
Video Streams
Load data streams into
data stores
Amazon Kinesis
Data Firehose
SQL
Analyze data streams
with SQL
Amazon Kinesis
Data Analytics
Capture, process, and
store data streams
Amazon Kinesis
Data Streams
24. Data ingestion from a variety of sources
Kinesis Data
Streams
Transactions
ERP
Web logs/
cookies
Connected
devices
AWS SDKs
• Publish directly from application code via APIs
• AWS Mobile SDK
• Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data
Analytics and more
• RDS Aurora via Lambda
Kinesis Agent
• Monitors log files and forwards lines as messages to Kinesis Data Streams
Kinesis Producer Library (KPL)
• Background process aggregates and batches messages
3rd party and open source
• Log4j appender
• Apache Kafka
• Flume, fluentd, and more …
25. Data processing from a variety of consumers
Fully managed service for real-time processing of streaming data
Cost-effective: $0.014 per 1,000,000 PUT Payload Units
Millions of sources
producing 100’s of
terabytes per hour
Amazon Web Services
Front
End
AZ AZ AZ
Authentic
authorization
Durable, highly consistent storage replicas data
across three data centers (availability zones)
Ordered stream of
events supports
multiple readers
Amazon
Kinesis Client
Library on
EC2
Amazon
Kinesis Data
Firehose
Amazon
Kinesis Data
Analytics
AWS Lambda
26. Amazon Kinesis Data Streams: Standard
consumers
Shard 1
Shard 2
Shard 3
Shard n
Kinesis Data Stream
Consumer
application A
GetRecords()
Data
GetRecords():
Five transactions per second, per shard
Data:
2MB per second, per shard
Data
producer
up to 1 MB
or 1000
records per
second, per
shard
With only one
consumer
application,
records can be
retrieved every
200 ms
27. Amazon Kinesis Data Streams: Enhanced fan-out
consumers
Consumers do not poll. Messages are pushed to the consumer as they arrive
Shard 1
Kinesis Data Stream
Data
producer
Consumer
application A
SubscribeToShard()
Uses HTTP/2
• Up to five mins connection
• Data pushed to consumer
persist
28. Enhanced fan-out
• Multiple consumer applications for
the same Kinesis Data Stream
• Default limit of five
registered consuming
applications. More can be
supported with a service
limit increase request
• Low-latency requirements for data
processing
• Messages are typically
delivered to a consumer in
less than 70 ms
Amazon Kinesis Data Streams Consumers
Standard
• Total number of consuming
applications is low
• Consumers are not latency-
sensitive
• Minimize cost
32. SQL on streaming data?
Aggregations (count, sum, min, … ) take granular real-time
data and turn it into insights
Data is continuously processed so you need to tell the
application when you want results
Aggregation Windows
33. Window types
Sliding, tumbling, and stagger
Tumbling windows are fixed size and grouped keys do not overlap
Source
Time
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
34. Writing streaming SQL
Pump (continuous query) using stagger window
CREATE OR REPLACE PUMP calls_per_ip_pump AS
INSERT INTO calls_per_ip_stream
SELECT STREAM source_ip_address,
COUNT(*)
FROM source_sql_stream_001
WINDOWED BY STAGGER(
PARTITION BY source_ip_address
RANGE INTERVAL '1' MINUTE);
42. N E W !
Amazon Forecast
Any historical
time-series
Integrates with SAP and
Oracle Supply Chain
Custom forecasts
with 3 clicks
50% more
accurate
1/10th
the cost
Integrates with
Amazon Timestream
Retail demand Travel demand AWS usage
Revenue forecasts Web traffic Advertising demand
Generate forecasts for:
Accurate time-series forecasting service, based on the same technology
used at Amazon.com