Evolving your analytics from batch processing to real-time processing can have a major business impact, but ingesting streaming data into your data warehouse requires building complex streaming data pipelines. Amazon Kinesis Firehose solves this problem by making it easy to transform and load streaming data into Amazon Redshift so that you can use existing analytics and business intelligence tools to extract information in near real-time and respond promptly. In this session, we will dive deep using Amazon Kinesis Firehose to load streaming data into Amazon Redshift reliably, scalably, and cost-effectively.
4. Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
5. The diminishing value of data
Recent data is highly valuable
• If you act on it in time
• Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable
• If you have the means to combine them
6. Streaming Data Scenarios Across Verticals
Scenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Responsive Data Analysis
Machine Learning
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with
ads, optimized bid/buy
engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer
online game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions
and page views
Recommendation engines,
proactive care
8. Amazon Kinesis Customer Base Diversity
1 billion events/wk from
connected devices | IoT
17 PB of game data per
season | Entertainment
80 billion ad
impressions/day, 30 ms
response time | Ad Tech
100 GB/day click streams
from 250+ sites |
Enterprise
50 billion ad
impressions/day sub-50
ms responses | Ad Tech
10 million events/day
| Retail
Amazon Kinesis as Databus -
Migrate from Kafka to Kinesis| Enterprise
Funnel all
production events
through Amazon
Kinesis
10. Amazon Kinesis
Streams
For Technical Developers
Build your own custom
applications that process
or analyze streaming data
Amazon Kinesis
Firehose
For all developers, data
scientists, operations
Easily load massive volumes
of streaming data into
Amazon S3, Amazon
Redshift and Amazon
Elasticsearch Service
Amazon Kinesis
Analytics
For all developers, data
scientists
Easily analyze data streams
using standard SQL queries
Amazon Kinesis: Streaming data made easy
11. Amazon Kinesis Streams
• Easy administration
• Build real time applications with framework of choice
• Low cost
12. Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms
13. Amazon Kinesis Firehose
• Zero administration
• Direct-to-data store integration
• Seamless elasticity
14. Amazon Kinesis Firehose: Destinations
Amazon S3
• Durable, scalable object storage
• Web service interface
Amazon Elasticsearch Service
• Managed Elasticsearch service
• Direct access to Elasticsearch open-source API
Amazon Redshift
• Fast, managed data warehouse
• Scales to petabytes
15. Kinesis Firehose Data Transformation
• Streaming ETL
• Firehose buffers up to 3MB of ingested data
• When buffer is full, automatically invokes Lambda function,
passing list of records to be processed
• Lambda function processes and returns list of transformed
records, with status of each record
• Transformed records are saved to configured destination
[{"
"recordId": "1234",
"data": "encoded-data"
},
{
"recordId": "1235",
"data": "encoded-data"
}
]
[{
"recordId": "1234",
"result": "Ok"
"data": "encoded-data"
},
{
"recordId": "1235",
"result": "Dropped"
"data": "encoded-data"
}
]
16. Amazon Kinesis Firehose: Producing Data
Send data from IT infrastructure, mobile devices,
and field sensors
Integrated with AWS SDKs
Kinesis Agent
• Installs on your servers
• Tails log files, forwards to Kinesis Streams or Kinesis
Firehose
AWS IoT integration
17. Amazon Kinesis Firehose: Redshift Delivery
Delivers data to Amazon S3
• Buffer Size: 1 to 128 MB
• Buffer Interval: 60 – 300 seconds
• Flushed to S3 when threshold is met (whichever occurs
first)
Executes COPY Command in Redshift
• Redshift will copy data from S3
Executes COPY commands serially
• Subsequent COPY commands run upon completion of
previous COPY
18. Amazon Kinesis Firehose: Redshift Delivery
Delivery Failure
• Specify retry duration: 0 – 7200 seconds
• Firehose retries for specified time
• After retry duration, failed records skipped and written to
S3
20. Hot Homes
There's an 80% chance this home will sell in the next 11 days – go tour it soon.
21. Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon S3
Data lake
Amazon EMR
Amazon
Kinesis
Amazon RedShift
Answers &
Insights
Hot HomesUsers
Properties
Agents
User Profile
Recommendation
Hot Homes
Similar Homes
Agent Follow-up
Agent Scorecard
Marketing
A/B Testing
Real Time Data
…
Amazon
DynamoDB
BI / Reporting
23. Demo Scenario
Stream raw Apache access log records as
they’re created
Transform to CSV with AWS Lambda
Automatically copy to Redshift
Run analysis
24. Your Big Data Application Architecture
Kinesis
Producer UI
Amazon
Kinesis
Firehose
Amazon
Redshift
Amazon
QuickSight
Generate web
logs
Deliver processed web
logs to Redshift
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Transform raw data
to structured data
25. Demo: Configure Data Transformation
We need to convert each record from the Apache access log format:
to a format that can be imported to Redshift with its COPY command. In
this activity, we'll configure Firehose to use a Lambda function to convert
each record to CSV.
75.35.230.210 - - [20/Jul/2009:22:22:42 -0700] "GET /images/pigtrihawk.jpg " 200 29236
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215
Firefox/3.0.11 (.NET CLR 3.5.30729)"
Kinesis
Firehose
Lambda