Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix (or even if you are)

How to be data-driven when you aren't Netflix
(or even if you are)
Charles Sonigo, Head of Product, Streamroot
DEMUXED - October 18, 2018

Who are we...
© 2018 Streamroot - All rights reserved
The leading peer-accelerated delivery solution for OTT video and device-side
performance optimization company
WHAT
✓ WebRTC-based peer-accelerated delivery
✓ Mid-stream client-side CDN switching
✓ eCDN for global businesses
WHY
✓ Greater scale
✓ Better QoS
✓ Lower spend

+20 Million
UNIQUE VIDEO
SESSIONS DAILY
+5 Billion
UNIQUE VIDEO
SESSIONS IN 2018
Who are we...
The largest and most trusted distributed delivery network for OTT video

1 The Challenge
Improving software when performance can only be measured in
production...

The Challenge: Axioms - Requirements
● KPIs highly dependent on environmental factors
● Controlled testing environments never capture real-
world complexity
● Intuition will be wrong
● Need to maintain team velocity and feature releases
● Need to be able to randomize, ideally at user level

Backend Logic
● Content caching
● Encoding experiments
● CDN load balancing
● Peer matching
The Challenge: use cases
Client-side logic
● UI
● ABR improvements
● Ads Workflows

2 How it All Began
How to build a data pipeline with only a few good men
(and women).

ANALYTICS PROCESSOR
Basic graphs
Custom dashboards for everyday
work
In-house advanced statistics
services
CLIENT METRICS
Performance & QoS
metric payloads sent to
HTTP endpoint
MESSAGE BROKER
& DATA
PROCESSORS
Receives and distributes
messages
Preprocessing for specific
analysis
ANALYTICS DB
Store and pre-aggregate
data.
Offers cross dimensions
drills, filters and more.
How it Started: Our Basic Data Lab
HTTP
endpoint
data
processors

Message Broker and Data Processors
Potential Upgrades Based on Your Needs
● Scalability and resilience:
○ Data collection services with Kubernetes autoscaling
○ Partitioned Kafka topics
○ Flink for distributed, resilient, stateful data processors

Hadoop with Avro
● Hadoop ecosystem for industrialised data processing
● Storage gain with Avro compression
● Robust schema evolution with backward compatibility
Potential Upgrades Based on Your Needs
Analytics DB and Processors
Zeppelin (or Jupyter) + PySpark:
● Limitless analysis capabilities
● Quick exploration/iterations on complex analysis
● Easy to share implementations

Many options are out there depending on your requirements: scale, features, open-source,
proprietary, Cloud, etc.
Message broker Data Processor Database BI/viz tools Custom analytics
Apache Kafka Flink Druid Superset Python
RabbitMQ Storm InfluxDB Grafana R
ActiveMQ Hive Hadoop Tableau
ZeroMQ Kafka Streams PostGresSQL QlikView
Amazon Kinesis Spark ElasticSearch Kibana
The tools are out there

3 The Scientific Method for AB
Testing
(without compromising on team velocity and feature releases)

AB Testing: Deployment
RELEASE TESTING WITH REVERSE PROXY
● Randomizing version of the code used by your users
● Used to roll out new releases progressively

DYNAMIC CONFIG INJECTION
● Server returns different
configuration files
● Inject parameters to fine-tune /
toggle new features on and off
AB Testing: Configuration Injection

100%
V2V2 Conf AB TestV2 ??
AB Testing: New version roll-out
V2 AA’ TestV2 CANARYV1
1% of traffic
New features: toggled off
50% V1 - 50% V2
100%
100% w/ 5% new
features
New features: turn on one-by-
one
100%
New features:
all ON
RELEASE W/ REVERSE PROXY
Canary release
● Only need to detect major regressions and errors.
● If any, rollback and fix.

100%
V2V2 Conf AB TestV2 ??
1% of traffic
50% V1 - 50% V2
100%
100% w/ 5% new
features
one
100%
New features:
all ON
RELEASE W/ REVERSE PROXY
V2 AA’ Test
● Want to make sure V2 with features off behaves exactly live V1
● Critical step: any regression that slips through is very costly and hard to find later on
● If any, rollback and fix.

100%
V2V2 Conf AB TestV2 Release
1% of traffic
50% V1 - 50% V2
100%
100% w/ 5% new
features
one
100%
New features:
all ON
FULL DEPLOYMENT

100%
V2V2 Conf AB TestV2 Release
1% of traffic
50% V1 - 50% V2
100%
100% w/ 5% new
features
one
100%
New features:
all ON
CONFIGURATION
INJECTION
● Activate features one by one and analyse their impact
● Keep the good ones, rethink or tweak the bad ones

4 Beating the averages
Enhance, Enhance, Enhance!

Insert a graph that shows
outliers
101: Outliers
Some can be filtered out.
Some are symptoms of a
problem.
Stay vigilant!

P2P Data Exchange Protocol Samples Traffic offload %
TCP-like 15,772,543 65.54%
UDP-like 16,152,624 67.04%
UDP-like protocol is
obviously better!
102: Simpson & Confounding Factors...
… Or how aggregated results can give you a completely wrong idea.

P2P Data Exchange Protocol Samples Traffic offload %
TCP-like 15,772,543 65.54%
UDP-like 16,152,624 67.04%
P2P Data Exchange Protocol Type Samples Traffic offload %
TCP-like Live 13,004,140 62.10%
UDP-like Live 13,340,504 65.02%
TCP-like VOD 2,768,403 81.72%
UDP-like VOD 2,812,120 76.61%
Live: +2.92% for UDP-like
VOD: -5.11% for UDP-like
102: Simpson & Confounding Factors

What can you do to prevent it?
● You can’t... Stay vigilant!
● Always ask yourself if you have missed a lurking confounding factor.
● Know you data and cross check your results for each important dimension: live/VoD, low bitrate
vs. high bitrate, country / ISP, Devices, Browsers...
● Regularly check for over-represented groups in your audience and monitor them closely.
102: Simpson & Confounding Factors

103: Histograms
avg.
Splitting along
dimensions is critical.
It’s only natural to do the
same along your metrics.

201: Tackling noise
2 identical configurations →
The law of large numbers says
aggregated values will converge
...but when?
What is noise and what is an
actual effect of a change you
introduced?

201: Statistical Significance
Many possibilities based on the way you design your experiments
We use the Chi-Squared Test.
+ Low CPU cost
+ Can apply the test on any subgroups after preprocessing
+ Easy tweak to balance memory usage and precision
- Memory intensive
- Cardinality hurts fast

201: Chi-Squared Test
Test dependency of two variables: Label vs. Metric X
● If X and Label are truly independent (Null
Hypothesis), the Chi Squared distribution is known.
● Comparison of the observed Chi Squared distribution
versus the theoretical one gives us the probability
that the difference we see isn’t just noise.

201: Chi-Squared Visualized
Inputs:
● A metric with a
range of values
● 2 groups
● Some filters

201: Population movements or practical significance
1 1. Test Metadata:
● P Value
● Model Validity
● #Samples
● Total population
displacement

201: Population movements: practical significance
2
2. Axis
X: Buckets interval and
distribution of all samples
Y: Population movements
between the two groups

201: Impact on statistical significance

5 What’s Next for us?
● More statistical significance visuals
● User clusters based on how they interact with our tech
● ML for client configuration optimization

Thank you!
charles.sonigo@streamroot.io
Check out our engineering blog at
medium.com/streamroot-developers-blog

Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix (or even if you are)

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix (or even if you are)

Similar to Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix (or even if you are) (20)

Recently uploaded

Recently uploaded (20)

Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix (or even if you are)

Editor's Notes