Realtime classroom analytics powered by apache druid

Realtime Classroom Analytics
Powered By Apache Druid
Karthik Deivasigamani, Chief Architect, Noon - The Social Learning Platform

Agenda
● Who We Are
● Live Online Classroom
● Quality Of Experience
● Why Apache Druid
● Realtime Classroom Monitoring
● Key Lessons
● Q & A

Who Are We?
Noon has evolved into a ‘Social Learning’
platform three years ago to craft the most
engaging learning experience.
● Our mission is to radically change the
way people learn.
● Make learning more social and fun.
● 10M+ users from over 5 countries
● 1M+ MAU with 50+ mins per active day
per student

Live Online Classroom
Students spend a significant amount of their
time on Noon learning from their teacher
within the online classrooms.
Classroom Features
● Video, Audio, Chat and Whiteboard
● Breakouts, Raise Hand
● Peak 10K students / session

Live Classroom - Challenges
Audio
Voice is broken
● Teacher’s uplink quality
● Issues with microphone
● Student’s downlink
quality
● ISP policies
Whiteboard
Lag in whiteboard
● Loss of drawing events
due to unstable network
● Heavy CPU usage on the
mobile device
● Software Bug

Quality Of Experience
“Quality of experience is a measure
of the delight or annoyance of a
customer's experiences with a
service.” - Wikipedia

Monitoring The Classroom
Metrics
● Uplink/Downlink Network Quality
● Packet Loss
● Remote/Local Audio Quality
● Mic Status
● Jitter Buffer Delay
● frameFrozenRate
● Uplink/Downlink BitRate
Dimensions
● Country
● Region
● City
● Session
● User
● ISP
● Network Type
Aggregations
● Percentile
● Count
● Average
● Distinct Count
● Standard Deviation

System Characteristics
● Real Time Ingestion
● Scale Horizontally
● High Cardinality Data
● Subsecond Query Latency
● Fast Aggregation
● Zoom In & Zoom Out
● Highly Available

Why Apache Druid
● Real Time Ingestion From Kafka Through Spec Files
● Data & Query Nodes Allows For Horizontal Scaling
● Sketches For High Cardinality Columns
● Low-Latency Querying
● Rich Built In Capabilities For Exact & Approx Aggregation
● Data Rollups
● Fault Tolerance At Multiple Levels

Data Collection - Network & Audio
WebRTC Stats
Sent BitRate
Received BitRate
Audio Packet Loss
Audio Level
Bytes Sent/Received
Audio Frame Freeze Rate
Network Quality
Audio Quality

Data Collection - Whiteboard
Whiteboard Stats
Stroke Difference
Drift Percentage

Ingestion
● All ingestions happen via Kafka in real
time
● Flink Topology
● Split & Format to conform with
ingestion spec
● Rollup Enabled At Ingestion Time
● Conditional transformation
● Looking forward to using Lag Based
AutoScaler.

Making Ingestion Easy
● Well defined event (ProtoBuf) schema
serialized as JSON.
● Jsonpath based DSL defining
transformers & ingestion spec.
● Parsing & Transformation based on
the configuration file in a flink
topology.
● Ingestion Spec Auto Generated from
JSON configuration file.
● Automated Deployments Via Jenkins

Schema Design
● Always start from your use-cases.
● Identify Dimensions & Metrics
● Aggregations & Approximation (hyperloglog,
quantiles sketches)
● Query Granularity
● Partitions
● Deep Storage
● Data Retention

Self Serve Dashboard - Zoom Out & Zoom In
Country Level View
Sessions Inside A
Country
Session Level View
Students Inside A
Session View
Student Session
Level View

Our Druid Cluster
Topology
● Master (m5.2xl)
● Data Node (i3.2xl)
○ Tiered
○ 24 slots
● Query Node (m5.2xl)
● External ZK, MySQL, S3
Deep Storage
Monitoring Numbers
● Datadog-Druid
● System Resources
● Ingestion Lag
● Number of Segments
● Query Time
● JVM Memory Usage
● 15+ dims, 50+ metrics
● 105 M events per day
● 2B rows @ Avg Row Size
1K
● 4k-5k Segment
● p90 latency ~ 850 ms

Business Impact
● Quickly Identify Problems
● Validation of fixes put in to improve quality
● Self Serve Tool, reducing burden on
developers
● Improved transparency & trust between
OPS and developers
● Student NPS score improved

Challenges & Key Lessons
● Rollups are your best friend
● Ingestion Time Transformation > Query Time
Transformation
● Approximation - Hyperloglog, Data Sketches
● Late Arrival Of Messages & Compaction
● Query Performance depends on your data model
● Setup takes time to stabilize.
● druid-user group is super helpful!

Thank you
Contact: karthik@noonacademy.com

Realtime classroom analytics powered by apache druid

More Related Content

What's hot

Similar to Realtime classroom analytics powered by apache druid

Recently uploaded

Realtime classroom analytics powered by apache druid