Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introducing Tupilak,
Snowplow’s unified log fabric
Snowplow London Meetup #3, 21 Sep 2016
Quick show of hands
• Batch pipeline: how many here run the Snowplow batch pipeline?
• Real-time pipeline: how many here r...
From the beginning, Snowplow RT was
designed around small, composable workers…
Diagram from our
Feb 2014 Snowplow
v0.9.0 r...
… based on the insight that RT pipelines
can be composed a little like Unix pipes
Today, we see a growing number of async
micro-services making up Snowplow RT
Stream
Collector
Stream
Enrich
Kinesis S3
Kin...
But managing this kind of complexity
has some major challenges
“How do we
monitor this
topology, and
alert if something
(d...
Snowplow Batch has evolved a deep
technical stack to handle these challenges
We asked, what should the equivalent
underlying fabric be for Snowplow RT?
Enter Tupilak!
“A tupilak was an avenging monster
fabricated by a shaman by using
animal parts (bone, skin, hair,
sinew, e...
Today Tupilak serves 3 key functions for the
Snowplow RT pipeline (Managed Service)
Monitoring
Auto-scaling
Alerting
• Vis...
Let’s look at auto-scaling in particular
# Shards in
Kinesis
Stream
# EC2
Instances
• We scale the number of shards in eac...
A demo of the Tupilak UI
Under the
hood,
Tupilak is
built on
Snowplow!
What’s next for Tupilak? 1. Better auto-scaling
# Shards in
Kinesis
Stream
# EC2
Instances
• We scale the number of shards...
2. Replacing our use of EC2 Auto-Scaling
Groups with Docker + Kubernetes
Questions?
Upcoming SlideShare
Loading in …5
×

Introducing Tupilak, Snowplow's unified log fabric

887 views

Published on

In this talk at Snowplow London Meetup #3 I introduced Tupilak, Snowplow’s unified log fabric. Putting a real-time event pipeline into production has many challenges: we need the pipeline to scale automatically based on event volumes, we need constant monitoring to prevent data loss and minimise end-to-end lag, and we need the ability to upgrade and extend the pipeline with zero downtime. We call software which does all this a “unified log fabric”, to distinguish it from the unified logs (e.g. Kafka and Kinesis) and stream processing frameworks (e.g. Spark Streaming and Kafka Streams) which such a fabric monitors and orchestrates.

As part of incorporating Snowplow’s Kinesis-based event pipeline into our Managed Service, we developed our own unified log fabric, called Tupilak. In this talk, I introduced Tupilak, explaining the core monitoring and scaling functions of Tupilak and showing live real-time pipelines visualised in the Tupilak UI. I dived into the architecture of Tupilak, shared its basic scaling algorithm and also took a look at how Tupilak itself is built on a Snowplow event stream. I also talked about the roadmap for Tupilak, including our plans for introducing lag-based auto-scaling and porting Tupilak to Kubernetes.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Introducing Tupilak, Snowplow's unified log fabric

  1. 1. Introducing Tupilak, Snowplow’s unified log fabric Snowplow London Meetup #3, 21 Sep 2016
  2. 2. Quick show of hands • Batch pipeline: how many here run the Snowplow batch pipeline? • Real-time pipeline: how many here run the Snowplow RT pipeline? • Orchestration: how are you running, scaling, monitoring the real-time pipeline? • Anything else: who here is evaluating Snowplow or just curious?
  3. 3. From the beginning, Snowplow RT was designed around small, composable workers… Diagram from our Feb 2014 Snowplow v0.9.0 release post
  4. 4. … based on the insight that RT pipelines can be composed a little like Unix pipes
  5. 5. Today, we see a growing number of async micro-services making up Snowplow RT Stream Collector Stream Enrich Kinesis S3 Kinesis Elasticsearch Kinesis Tee (coming soon) Redshift dripfeeder (design stage) User’s AWS Lambda function User’s KCL worker app User’s Spark Streaming job
  6. 6. But managing this kind of complexity has some major challenges “How do we monitor this topology, and alert if something (data loss; event lag) is going wrong?” “How do we scale our streams and micro-services to handle event peaks and troughs smoothly?” “How do we re- configure or upgrade our micro-services without breaking things?”
  7. 7. Snowplow Batch has evolved a deep technical stack to handle these challenges
  8. 8. We asked, what should the equivalent underlying fabric be for Snowplow RT?
  9. 9. Enter Tupilak! “A tupilak was an avenging monster fabricated by a shaman by using animal parts (bone, skin, hair, sinew, etc). The creature was given life by ritualistic chants. It was then placed into the sea to seek and destroy a specific enemy.”
  10. 10. Today Tupilak serves 3 key functions for the Snowplow RT pipeline (Managed Service) Monitoring Auto-scaling Alerting • Visualizing the complex stream + worker topology in one place • Indicating micro-services which are failing or falling behind (“lagging”) • Auto-scaling the number of shards in each Kinesis stream • Auto-scaling the number of EC2 instances running each micro-service • Notifying our ops team in the case of a failing or lagging micro-service via PagerDuty
  11. 11. Let’s look at auto-scaling in particular # Shards in Kinesis Stream # EC2 Instances • We scale the number of shards in each stream based on the read/write throughput we are seeing Read/write throughput • We scale the number of EC2 instances based on some fixed assumptions about the ratio between shards and workers + - + -
  12. 12. A demo of the Tupilak UI
  13. 13. Under the hood, Tupilak is built on Snowplow!
  14. 14. What’s next for Tupilak? 1. Better auto-scaling # Shards in Kinesis Stream # EC2 Instances • We scale the number of shards in each stream based on the read/write throughput we are seeing, and the lag of any services consuming this stream or downstream of this stream Read/write throughput + - + - Micro-service lag Performance metrics relative to stream
  15. 15. 2. Replacing our use of EC2 Auto-Scaling Groups with Docker + Kubernetes
  16. 16. Questions?

×