Since its inception, the Snowplow open source event analytics platform (https://github.com/snowplow/snowplow) has always been tightly coupled to the batched-based Hadoop ecosystem, and Elastic MapReduce in particular.
With the release of Amazon Kinesis in late 2013, we set ourselves the challenge of porting Snowplow to Kinesis, to give our users access to their Snowplow event stream in near-real-time.
With this porting process nearing completion, Alex Dean, Snowplow Analytics co-founder and technical lead, will share Snowplow’s experiences in adopting stream processing as a complementary architecture to Hadoop and batch-based processing.
In particular, Alex will explore:
- “Hero” use cases for event streaming which drove our adoption of Kinesis
- Why we waited for Kinesis, and thoughts on how Kinesis fits into the wider streaming ecosystem
- How Snowplow achieved a lambda architecture with minimal code duplication, allowing Snowplow users to choose which (or both) platforms to use
- Key considerations when moving from a batch mindset to a streaming mindset – including aggregate windows, recomputation, backpressure