This document describes Epiphany, Rocket Fuel's real-time attribution platform. It connects millions of events to 50 billion data points to attribute conversions across devices and algorithms. It uses HBase to lookup impressions in milliseconds. Data flows from actions keyed by user/impression/conversion days to HBase and Hive tables. It enables idempotent attribution across advertisers and algorithms at scale in real-time.
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real-Time
1. Epiphany:
Connecting Millions Of Events To 50 Billion
Data Points In Real-time
Anirban Banerjee
abanerjee@rocketfuel.com
Shahansad Kp
skp@rocketfuel.com
5. How Was This "Conversion" Achieved?
- Identify the effect of every single impression across
every medium during the customer’s journey
- Needed by modeling, reporting, analysts, customers.
11. Action by
Impression day
Action by
Conversion day
Rocket Fuel
Attribution
Previous Day
Advertiser Data
Current Day
Rocket Fuel Conversion
Data
Reattribution
Data Flow & Data Democracy
Analysts
Downstream
ETL
Rocket Fuel Impression
History Data
13. (|Conversions| * |Impressions| *
|Algorithms|)
Impressions
Tens of Billions
Advertiser reports
Thousands
Conversions
Hundreds
of millions
Algorithms
Hundreds
O
20. Point updates to hive
Point updates to a intermediate HBase table
Periodically pulled to Hive
21. Epiphany Tables
Action keyed by
Impression day
Action keyed by
Conversion day
Action keyed by
User Id
HBase Table
Hive Table
INDEXAction by
Conversion day
Action by
Impression day
22. Intermediate Table Data Flow
Action keyed by
User Id
Records with deltaTimestamp
based scan
Action keyed by
Impression day
Action keyed by
Conversion day
Old state of records
with delta
Point reads
Computed “changes”
27. Test releases with HBase snapshots
Monitor health of HBase instance
Use WAL (Write ahead log)
28. Generic solution at scale
One ring to rule them all
- Multiple attribution algorithms
- Cross-device scenario
- Advertiser attribution data
Faster availability, faster experiments
More accessible data
- e.g. point-readable actions