Let’s introduce Amazon Kinesis
Inaugural meetup of the
Amazon Kinesis - London User Group
This evening
•

Introducing Amazon Kinesis, Ian Meyers, AWS

•

Pizza and drinks break

•

Kinesis and Snowplow, Alex Dean...
Introducing Amazon Kinesis
Snowplow and Kinesis

1.

Snowplow – who we are

2.

Why are we excited about Kinesis?

3.

Adding Kinesis support to Snow...
Snowplow – who we are
Today, Snowplow is primarily an open source web analytics
platform
Snowplow: data pipeline
Website / webapp
Amazon S3

Col...
Snowplow was born out of our frustration with traditional web
analytics tools…
• Limited set of reports that don’t answer ...
…and out of the opportunities to tame big data new
technologies presented

These tools make it possible to capture, transf...
Snowplow is composed of a set of loosely coupled subsystems,
architected to be robust and scalable
1. Trackers

A

2. Coll...
Why are we excited about
Kinesis?
A quick history lesson: the three eras of business data processing

1.

The classic era, 1996+

2.

The hybrid era, 2005+
...
The classic era, 1996+
OWN DATA CENTER
NARROW DATA SILOES

LOW LATENCY LOCAL LOOPS

Point-to-point
connections

CMS

E-com...
The hybrid era, 2005+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

LOW LATENCY LOCAL LOOPS

CMS

Local loop
...
The unified era, 2013+
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

SOME LOW LATENCY LOCAL LOOPS

Search

CMS
Silo

...
The unified log is Kinesis (or Kafka)
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SOME LOW ...
Can we implement Snowplow on top of Kinesis?
CLOUD VENDOR / OWN DATA CENTER
NARROW DATA SILOES

Search

SAAS VENDOR #1

SO...
Adding Kinesis support to
Snowplow
Where we are heading with our Kinesis architecture
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink
Ki...
We took an important first step in our last release…

0.8.12

pre-0.8.12

hadoop-etl

scala-hadoopenrich

scala-kinesis-en...
… and the next release should get us much closer
Snowplow
Trackers

Scala Stream
Collector

Raw event
stream

S3 sink Kine...
Live demo!
Questions?

http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
And finally…

Huge thanks to our hosts!
Upcoming SlideShare
Loading in...5
×

Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

904

Published on

This is my presentation to the inaugural meetup of the Amazon Kinesis London User Group.

In it I briefly introduced Snowplow, explained why we were excited about Kinesis (drawing on my "three eras" blog post) and then set out how we are updating Snowplow to run on Kinesis. I concluded with a live demo of what we have running on Kinesis so far.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
904
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
20
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group"

  1. 1. Let’s introduce Amazon Kinesis Inaugural meetup of the Amazon Kinesis - London User Group
  2. 2. This evening • Introducing Amazon Kinesis, Ian Meyers, AWS • Pizza and drinks break • Kinesis and Snowplow, Alex Dean, Snowplow Analytics • Drinks • All courtesy of our hosts:
  3. 3. Introducing Amazon Kinesis
  4. 4. Snowplow and Kinesis 1. Snowplow – who we are 2. Why are we excited about Kinesis? 3. Adding Kinesis support to Snowplow 4. Live demo! 5. Questions
  5. 5. Snowplow – who we are
  6. 6. Today, Snowplow is primarily an open source web analytics platform Snowplow: data pipeline Website / webapp Amazon S3 Collect Transform and enrich Amazon Redshift / PostgreSQL • Your granular, event-level and customer-level data, in your own data warehouse • Connect any analytics tool to your data • Join your web analytics data with any other data set
  7. 7. Snowplow was born out of our frustration with traditional web analytics tools… • Limited set of reports that don’t answer business questions • • • • Traffic levels by source Conversion levels Bounce rates Pages / visit • Web analytics tools don’t understand the entities that matter to business • Customers, intentions, behaviours, articles, videos, authors, subjects, services… • …vs pages, conversions, goals, clicks, transactions • Web analytics tools are siloed • Hard to integrate with other data sets incl. digital (marketing spend, ad server data), customer data (CRM), financial data (cost of goods, customer lifetime value)
  8. 8. …and out of the opportunities to tame big data new technologies presented These tools make it possible to capture, transform, store and analyse all your granular, event-level data, to you can perform any analysis
  9. 9. Snowplow is composed of a set of loosely coupled subsystems, architected to be robust and scalable 1. Trackers A 2. Collectors B 3. Enrich C 4. Storage D 5. Analytics Generate event data Receive data from trackers and log it to S3 Clean and enrich raw data Store data ready for analysis Examples: • Javascript tracker • Python / Lua / No-JS / Arduino tracker Examples: • Cloudfront collector • Clojure collector for Amazon EB Built on Scalding / Cascading / Hadoop and powered by Amazon EMR Examples: • Amazon Redshift • PostgreSQL • Amazon S3 • Batch-based A D Standardised data protocols • Normally run overnight; sometimes every 4-6 hours
  10. 10. Why are we excited about Kinesis?
  11. 11. A quick history lesson: the three eras of business data processing 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/
  12. 12. The classic era, 1996+ OWN DATA CENTER NARROW DATA SILOES LOW LATENCY LOCAL LOOPS Point-to-point connections CMS E-comm Local loop ERP Local loop Silo CRM Local loop Silo Local loop Silo Nightly batch ETL process HIGH LATENCY WIDE DATA COVERAGE Management reporting Data warehouse FULL DATA HISTORY Silo
  13. 13. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search LOW LATENCY LOCAL LOOPS CMS Local loop SAAS VENDOR #1 E-comm Local loop Silo Local loop Silo APIs ERP Local loop Silo CRM Local loop Silo Bulk exports SAAS VENDOR #2 Stream processing Micro-batch processing Batch processing Batch processing Email marketing Local loop Product rec’s Local loop LOW LATENCY Systems monitoring Data warehouse Hadoop SAAS VENDOR #3 Local loop LOW LATENCY Management reporting HIGH LATENCY Ad hoc analytics HIGH LATENCY Web analytics Local loop
  14. 14. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES SOME LOW LATENCY LOCAL LOOPS Search CMS Silo E-comm Silo APIs ERP Silo LOW LATENCY Streaming APIs / web hooks WIDE DATA SAAS VENDOR #2 COVERAGE Unified log Email marketing FEW DAYS’ DATA HISTORY Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > CRM Silo Eventstream Archiving SAAS VENDOR #1 Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  15. 15. The unified log is Kinesis (or Kafka) CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  16. 16. Can we implement Snowplow on top of Kinesis? CLOUD VENDOR / OWN DATA CENTER NARROW DATA SILOES Search SAAS VENDOR #1 SOME LOW LATENCY LOCAL LOOPS CMS Silo E-comm Silo APIs ERP Silo CRM Silo Streaming APIs / web hooks Eventstream SAAS VENDOR #2 Unified log Archiving Hadoop HIGH LATENCY < WIDE DATA COVERAGE > < FULL DATA HISTORY > Email marketing Ad hoc analytics Product rec’s Systems monitoring Management reporting Fraud detection Churn prevention LOW LATENCY
  17. 17. Adding Kinesis support to Snowplow
  18. 18. Where we are heading with our Kinesis architecture Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  19. 19. We took an important first step in our last release… 0.8.12 pre-0.8.12 hadoop-etl scala-hadoopenrich scala-kinesis-enrich Record-level enrichment functionality scala-common-enrich
  20. 20. … and the next release should get us much closer Snowplow Trackers Scala Stream Collector Raw event stream S3 sink Kinesis app S3 Enrich Kinesis app Enriched event stream Redshift sink Kinesis app Redshift Bad raw events stream
  21. 21. Live demo!
  22. 22. Questions? http://snowplowanalytics.com https://github.com/snowplow/snowplow @snowplowdata
  23. 23. And finally… Huge thanks to our hosts!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×