AWS provides multiple ways to ingest and process real-time data generated from sources such as Edge device, logs, websites, mobile apps, IoT devices and more.
In this session we will compare the different tools and technologies and share best practices for when to use what.
The session will cover: Apache Kafka, Kinesis Data Streams/Firehose, MSK (Managed Kafka), Kinesis Data Analytics for SQL and Java (Flink), Apache Spark and more.
43. Unified Marketing Data
Automatically connect and combine campaign data from thousands of sources
with attribution data for a single source of truth.
What does Singular do?
Automation
Streamline workflows and focus on strategy by automating tedious and
manual tasks.
Audiences
Tap into attribution data to create high value audience segments and
automatically distribute across ad networks.
44. Mobile App Events
Purchases, installs, sessions, custom events etc.
Audiences at a Glance
Segmentation Criteria
For example, “Users who spend more than $10 a month”
Audience Distribution
Resulting devices used for retargeting, etc.
45. Handle “Lots of Data”
Currently at 13TB daily and around 30,000 events per second. And growing...
Requirements
Near Real-time
Eg. “Users who have not signed up yet” is useless if it’s not an up-to-date list.
Not Just Volume...
How to get the data there? How do we deal with failures?
Easy to Scale and Add Capacity
At mercy of customers’ data...
46. Option: Build our own
NSQ was existing stack but required maintenance. Durability concerns.
Decisions...
Option: Apache Kafka
Solid product but also requires additional maintenance. Initial setup cost.
Option: Kinesis
Already in AWS ecosystem. Easily scalable out of the box.
47. Throughput
Predictable and tunable knobs for throughput.
Why Amazon Kinesis?
Real-time Streaming
With added benefit of 7 days of lookback -- just in case.
Easy to Scale
Tune capacity through shards. Monitoring built-in.
48. No Official KPL Package
Used API and followed KPL guidelines, especially regarding batch
Learnings: Golang
No Official KCL Package
Forked Python KCL package to run Go binary.
MultiLangDaemon
Keep in mind: one golang process per Kinesis shard.
49. Multiple Applications
Without EFO and KCL 2.0, applications share throughput of shards.
Learnings: Enhanced fan-out & KCL 2.0
Enhanced fan-out & KCL 2.0
Applications get their own dedicated shards. EFO automatically enabled when
using a KCL 2.0 client.
No Golang
Official Java and Python support. Required our forked MLD.