Building Sessionization
Pipeline at Scale with
Databricks Delta
April 24th, 2019
Comcast
-Xfinity X1
-Xfinity Internet and xFi
-Xfinity Home
-Xfinity Mobile
How do we improve our products?
Data captures our
customers feedback
at scale…
We decipher and
extract insights…
to enhance customer
experience
to empower data-
informed decisions…
We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
Data Scale
Billions of Events Petabytes of Stored Data Millions TPS
What is sessionization?
Challenges/Goals
1. Scalability
2. Reliability/ Robust
3. Performance
Value Gains
Before After
Batch process Stream process
84 jobs 3 jobs
~14 hours data delay ~7 hours data delay
Min. late data and failure support Checkpointing
Initial Design
Data Parse & Assign Sessionize Enrich
Scalability
Reliability
Performance
?
Manually Partition Key to Enable
Scaling
Data Parse & Assign
Scalability
Reliability
Performance
Key 1
Key 2
Sessionize EnrichKey 32
s3://mybucket/key=<key>/type=<type>/date=<yyyy-mm-dd>/hour=<hh>/…
From Batch to Streaming
Data Parse & Assign Sessionize Enrich
Scalability
Reliability
Performance
Data
Delta
Data Parse & Assign Sessionize Enrich
Scalability
Reliability
Performance
Delta Delta Delta
Data
Optimize Optimize
?
Random Prefixes
Data Parse & Assign Sessionize Enrich
Scalability
Reliability
Performance
Delta Delta Delta
Data
Optimize Optimize
random prefix random prefix random prefix
Auto Optimize and More
Data Parse & Assign Sessionize Enrich
Scalability
Reliability
Performance
Delta Delta Delta
DataData
Upsert
Delta
random prefix random prefix random prefix
Result
Reduced a 84 jobs process to 3 jobs
Deliver enriched data 2x faster
Increase operation friendliness
Delta
random prefix
Delta
random prefix
Delta
random prefix
Outcome
Scalable data pipeline that provides consumable insights to
our teams near real-time reliably.
Data Parse & Assign Sessionize Enrich
Delta Delta Delta
OKRs Experience
Enhancements
Product
Research
Feature
Developments

Building Sessionization Pipeline at Scale with Databricks Delta

  • 1.
    Building Sessionization Pipeline atScale with Databricks Delta April 24th, 2019
  • 2.
    Comcast -Xfinity X1 -Xfinity Internetand xFi -Xfinity Home -Xfinity Mobile
  • 3.
    How do weimprove our products? Data captures our customers feedback at scale… We decipher and extract insights… to enhance customer experience to empower data- informed decisions… We collect, store, and use all data in accordance with our privacy disclosures to users and applicable laws.
  • 4.
    Data Scale Billions ofEvents Petabytes of Stored Data Millions TPS
  • 5.
  • 6.
  • 7.
    Value Gains Before After Batchprocess Stream process 84 jobs 3 jobs ~14 hours data delay ~7 hours data delay Min. late data and failure support Checkpointing
  • 8.
    Initial Design Data Parse& Assign Sessionize Enrich Scalability Reliability Performance
  • 9.
    ? Manually Partition Keyto Enable Scaling Data Parse & Assign Scalability Reliability Performance Key 1 Key 2 Sessionize EnrichKey 32 s3://mybucket/key=<key>/type=<type>/date=<yyyy-mm-dd>/hour=<hh>/…
  • 10.
    From Batch toStreaming Data Parse & Assign Sessionize Enrich Scalability Reliability Performance Data
  • 11.
    Delta Data Parse &Assign Sessionize Enrich Scalability Reliability Performance Delta Delta Delta Data Optimize Optimize
  • 12.
    ? Random Prefixes Data Parse& Assign Sessionize Enrich Scalability Reliability Performance Delta Delta Delta Data Optimize Optimize random prefix random prefix random prefix
  • 13.
    Auto Optimize andMore Data Parse & Assign Sessionize Enrich Scalability Reliability Performance Delta Delta Delta DataData Upsert Delta random prefix random prefix random prefix
  • 14.
    Result Reduced a 84jobs process to 3 jobs Deliver enriched data 2x faster Increase operation friendliness Delta random prefix Delta random prefix Delta random prefix
  • 15.
    Outcome Scalable data pipelinethat provides consumable insights to our teams near real-time reliably. Data Parse & Assign Sessionize Enrich Delta Delta Delta OKRs Experience Enhancements Product Research Feature Developments