Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

•

1 like•279 views

Adrian Hornsby

Slides from my talk at the AWS Loft in San Francisco on September 13th. https://aws.amazon.com/start-ups/loft/sf-loft/

Technology

San Francisco Loft - 2017
Introduction to Real-time, Streaming
Data and Amazon Kinesis:
Streaming Data Ingestion with
Firehose
Adrian Hornsby (@adhorn)
Technical Evangelist with AWS

• Technical Evangelist, Developer Advocate,
… Software Engineer
• My @home is in Finland
• Previously:
• Solutions Architect @AWS
• Lead Cloud Architect @Dreambroker
• Director of Engineering, Software Engineer, DevOps, Manager, ... @Hdm
• Researcher @Nokia Research Center
• and a bunch of other stuff.
• Love climbing and ginger shots.

What to Expect from the Session
• Streaming data overview
• Firehose patterns overview
• Firehose usage patterns
• Streaming data end-to-end example and walk-
through

Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test

The diminishing value of data
• Recent data is highly valuable
• Old + Recent data is more valuable

Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist

Real-time streaming data made easy
Amazon Kinesis
Streams
• For Technical Developers
• Collect and stream data
for ordered, replayable,
real-time processing
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Redshift,
ElasticSearch
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries

Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low cost
• Build custom real-time applications to process streaming data

Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms

Amazon Kinesis Firehose
• Reliably ingest and deliver batched, compressed, and
encrypted data to S3, Redshift, and Elasticsearch
• Point and click setup with zero administration and
seamless elasticity

Amazon Kinesis makes it easy to work with
real-time streaming data
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Redshift,
ElasticSearch

Amazon Kinesis
Producers Consumers
Shard 1
Shard 2
Shard n
Shard 3
…
…
Write: 1MB Read: 2MB
** A shard is a group of data records in a stream

Amazon Kinesis Firehose
Producers Amazon S3
Amazon ES
Amazon Redshift
Shard 1
Shard 2
Shard n
Shard 3
…
…

Amazon Kinesis Firehose vs. Amazon Kinesis Streams
Amazon Kinesis Streams is for use cases that require custom processing,
per incoming record, with sub-1 second processing latency, and a choice of
stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero administration,
ability to use existing analytics tools based on Amazon S3, Amazon
Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or
higher.

Assemble a Real-time Advertising Solution

Optimize Digital Marketing with Clickstream
Analytics

Amazon Kinesis
Firehose
Amazon S3 Amazon Athena AWS Quicksight
Users browse content

Amazon Kinesis
Firehose
Amazon S3 Amazon Athena AWS Quicksight
AWS IoT
Sensor(s)

What's hot

Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...Amazon Web Services

AWS Webcast - AWS Kinesis WebinarAmazon Web Services

BDA304 Data-Driven Post MortemsAmazon Web Services

Deep Dive on Log Analytics with Elasticsearch ServiceAmazon Web Services

Supercharging the Value of Your Data with Amazon S3Amazon Web Services

Streaming Data Analytics with Amazon Redshift FirehoseAmazon Web Services

AWS Webcast - Introduction to Amazon KinesisAmazon Web Services

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services

AWS Real-Time Event ProcessingAmazon Web Services

AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services

A quick introduction to AWS Kinesisogeisser

Introduction to AWS KinesisSteven Ensslen

AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services

AWS Kinesis StreamsFernando Rodriguez

Co 4, session 2, aws analytics servicesm vaishnavi

Streaming data for real time analysisAmazon Web Services

ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services

Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services

What's hot (20)

Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...

AWS Webcast - AWS Kinesis Webinar

BDA304 Data-Driven Post Mortems

Deep Dive on Log Analytics with Elasticsearch Service

Supercharging the Value of Your Data with Amazon S3

Streaming Data Analytics with Amazon Redshift Firehose

AWS Webcast - Introduction to Amazon Kinesis

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)

AWS Real-Time Event Processing

AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)

A quick introduction to AWS Kinesis

Introduction to AWS Kinesis

AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...

AWS Kinesis Streams

Co 4, session 2, aws analytics services

Streaming data for real time analysis

ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR

Building Big Data Applications with Serverless Architectures - June 2017 AWS...

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv

Similar to Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

Getting Started with Real-time AnalyticsAmazon Web Services

Analysing All Your Streaming Data - Level 300Amazon Web Services

Serverless Real Time AnalyticsAmazon Web Services

(BDT320) New! Streaming Data Flows with Amazon Kinesis FirehoseAmazon Web Services

Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services

Getting Started with Amazon KinesisAmazon Web Services

Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services

Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services

Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM

AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services

Getting started with Amazon KinesisAmazon Web Services

Getting started with amazon kinesisJampp

Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services

Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services

BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services

Agile BI - Pop-up Loft Tel AvivAmazon Web Services

AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services

Case Study on Big Data Analytics of Supercell AshishSingh220482

Deep Dive in Big DataAmazon Web Services

AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services

Similar to Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose (20)

Getting Started with Real-time Analytics

Analysing All Your Streaming Data - Level 300

Serverless Real Time Analytics

(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose

Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Getting Started with Amazon Kinesis

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose

Deep dive and best practices on real time streaming applications nyc-loft_oct...

Em tempo real: Ingestão, processamento e analise de dados

AWS Summit Singapore - Architecting a Serverless Data Lake on AWS

Getting started with Amazon Kinesis

Getting started with amazon kinesis

Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017

Big Data Architectural Patterns and Best Practices on AWS

BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases

Agile BI - Pop-up Loft Tel Aviv

AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...

Case Study on Big Data Analytics of Supercell

Deep Dive in Big Data

AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

costume and set research powerpoint presentationphoebematthew05

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

"ML in Production",Oleksandr BaganFwdays

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Gen AI in Business - Global Trends Report 2024.pdfAddepto

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Developer Data Modeling Mistakes: From Postgres to NoSQL

"Debugging python applications inside k8s environment", Andrii Soldatenko

costume and set research powerpoint presentation

Pigging Solutions in Pet Food Manufacturing

Are Multi-Cloud and Serverless Good or Bad?

SIP trunking in Janus @ Kamailio World 2024

DevEX - reference for building teams, processes, and platforms

SQL Database Design For Developers at php[tek] 2024

Vertex AI Gemini Prompt Engineering Tips

Unleash Your Potential - Namagunga Girls Coding Club

Connect Wave/ connectwave Pitch Deck Presentation

Human Factors of XR: Using Human Factors to Design XR Systems

"ML in Production",Oleksandr Bagan

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Gen AI in Business - Global Trends Report 2024.pdf

APIForce Zurich 5 April Automation LPDG

Advanced Test Driven-Development @ php[tek] 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

1. San Francisco Loft - 2017 Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data Ingestion with Firehose Adrian Hornsby (@adhorn) Technical Evangelist with AWS

2. • Technical Evangelist, Developer Advocate, … Software Engineer • My @home is in Finland • Previously: • Solutions Architect @AWS • Lead Cloud Architect @Dreambroker • Director of Engineering, Software Engineer, DevOps, Manager, ... @Hdm • Researcher @Nokia Research Center • and a bunch of other stuff. • Love climbing and ginger shots.

3. What to Expect from the Session • Streaming data overview • Firehose patterns overview • Firehose usage patterns • Streaming data end-to-end example and walk- through

4. What is (Data) Streaming?

5. Streaming Data Overview

6. Most data is produced continuously Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/h tdocs/test

7. The diminishing value of data • Recent data is highly valuable • Old + Recent data is more valuable

8. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Ingest Transform Analyze React Persist

9. Amazon Kinesis Platform Overview

10. Real-time streaming data made easy Amazon Kinesis Streams • For Technical Developers • Collect and stream data for ordered, replayable, real-time processing Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into Amazon S3, Redshift, ElasticSearch Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries

11. Amazon Kinesis Streams • Reliably ingest and durably store streaming data at low cost • Build custom real-time applications to process streaming data

12. Amazon Kinesis Analytics • Interact with streaming data in real-time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms

13. Amazon Kinesis Firehose • Reliably ingest and deliver batched, compressed, and encrypted data to S3, Redshift, and Elasticsearch • Point and click setup with zero administration and seamless elasticity

14. Amazon Kinesis makes it easy to work with real-time streaming data Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into Amazon S3, Redshift, ElasticSearch

15. Amazon Kinesis Producers Consumers Shard 1 Shard 2 Shard n Shard 3 … … Write: 1MB Read: 2MB ** A shard is a group of data records in a stream

16. Amazon Kinesis Firehose Producers Amazon S3 Amazon ES Amazon Redshift Shard 1 Shard 2 Shard n Shard 3 … …

17. Firehose to Amazon S3

18. Firehose to Amazon Redshift

19. Firehose to Amazon Elasticsearch

20. Amazon Kinesis Firehose vs. Amazon Kinesis Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.

21. python_firehose.py VS python_kinesis.py

22. What are common use cases for Firehose?

23. IoT: Get Insights from Telemetry Data

24. IoT: Get Insights from Telemetry Data

25. Assemble a Real-time Advertising Solution

26. Optimize Digital Marketing with Clickstream Analytics

27.

28. Firehose Demo (Clickstream)

29. Amazon Kinesis Firehose Amazon S3 Amazon Athena AWS Quicksight Users browse content

30.

31.

32.

33.

34. Firehose Demo (IoT)

35. Amazon Kinesis Firehose Amazon S3 Amazon Athena AWS Quicksight AWS IoT Sensor(s)

36.

37. Amazon Firehose: deployments & testing

38.

39.

40. Kinesis Firehose Pricing

41.

42. Thank you

Editor's Notes

Narrative: The reality is that most data is produced continuously and is coming at us at lightning speeds due to an explosive growth of real-time data sources. TP: Machine data will make up 40% of our digital universe by 2020 Narrative: Whether it is log data coming from mobile and web applications, purchase data from ecommerce sites, or sensor data from IoT devices, it all delivers information that can help companies learn about what their customers, organization, and business are doing right now. TP: Customer Benefits Improve operational efficiencies, improve customer experiences, new business models Smart building: reduce energy costs, cut maintenance, increase safety and security Smart textiles: monitor skin temperature, monitor stress
Narrative: So how much is this data worth? Well, it depends… Recent data is highly valuable If you act on it in time Perishable Insights (M. Gualtieri, Forrester) Old + Recent data is more valuable If you have the means to combine them Narrative: Processing real-time data as it arrives can let you make decisions much faster and get the most value from your data. But, building your own custom applications to process streaming data is complicated and resource intensive. You need to train or hire developers with the right skillsets, and then wait for months for the applications to be built and fine-tuned, and the operate and scale the application as the business grows. All of this takes lots of time and money, and, at the end of the day, lots of companies just never get there, settle for the status-quo, and live with information that is hours or days old.
Narrative: You need a different set of analytical tools to collect and analyze real-time streaming data than what you have traditionally used for data at rest. With traditional analytics, you gather the information, store it in a database, and analyze it hours, days, or weeks later. Analyzing real-time data requires a different approach. Instead of running database queries on stored data, streaming analytics platforms have to process the data continuously and before the data lands in a database. And streaming data comes in at an incredible rate that can vary up and down all the time. Streaming analytics platforms have to be able to process this data when it arrives, often at speeds of millions and even tens of millions of events per hour. Key requirements of stream processing Durable: Durable ingest so that processing can be repeatable; Continuous - Need to always be processing the latest data Fast: Frequency (micro batches, size of batches, true streaming), and speed (sub-second, minute, hour) Correct: at most once, at least once, and exactly once processing; event time, ingest time, processing time. Reactive: Ability to process and respond in near real-time; feedback mechanisms to send processed data to live applications Reliable: Highly available, fast failovers
Since Amazon Kinesis launch in 2013, the ecosystem evolved and we introduced Kinesis Firehose and Kinesis Analytics. Streams was launched in GA at re:Invent 2014, Firehose at re:Invent 2015, and Analytics was launched in August 2016 We have continuously iterated to make it easier for customers to use streaming data, as well as expand the functionality of real-time processing Together, these three products make up the Amazon Kinesis streaming data platform
Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills. Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies Easy Scalability : Elastically scales to match data throughput for most workloads Easy and interactive experience: Complete most stream processing use cases in minutes, and easily progress toward sophisticated scenarios
Zero Admin: Capture and deliver streaming data into S3, Redshift, ElasticCache and other AWS destinations without writing an application or managing infrastructure Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into S3, and other destinations in as little as 60 secs, set up in minutes Seamless elasticity: Seamlessly scales to match data throughput (feedback: add bullet to discuss why firehose created. Major use case)
Since Amazon Kinesis launch in 2013, the ecosystem evolved and we introduced Kinesis Firehose and Kinesis Analytics. Streams was launched in GA at re:Invent 2014, Firehose at re:Invent 2015, and Analytics was launched in August 2016 We have continuously iterated to make it easier for customers to use streaming data, as well as expand the functionality of real-time processing Together, these three products make up the Amazon Kinesis streaming data platform
A shard is a group of data records in a stream. When you create a stream, you specify the numberof shards for the stream. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The total capacity of a stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a stream as needed. However, note that you are charged on a per-shard basis.
Please stay within brand by using the attached template. I’d recommend being visual – use imagery, font color, bold font, etc. in your slides. Be concise – limit your number of slides and content in them. It’s always good to have a few slides with backup information in case needed. Please also make sure there’s VP/Service Leader approval in place for all the content disclosed in the slides and in your call.
Feedback: put best practices into context that it’s a fully-managed service
Sonos runs near real-time streaming analytics on device data logs from their connected hi-fi audio equipment. Hearst: Analyzing 30TB+ clickstream data enabling real-time insights for Publishers. Nordstorm recommendation team built online stylist using Amazon Kinesis Streams and AWS Lambda.

Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

Similar to Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose (20)

More from Adrian Hornsby

More from Adrian Hornsby (20)

Recently uploaded

Recently uploaded (20)

Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data Ingestion with Firehose

Editor's Notes