Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships for publishers

Presenter Bios
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
James Hartig -- Admiral CTO
Co-founder of Admiral, currently
working on distributed systems
in Golang to build Admiral
platform

Company Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
www.getadmiral.com
The Visitor Relationship Management
Company
A single platform to help publishers
grow visitor relationships and revenue

Admiral Overview
● Sustainable publishing through relationships
● Subscriptions
● Engagement
○ Email newsletter
○ Adblocking
● Privacy (GDPR + CCPA)
● Simple one-tag installation

Custom Experiences
● Custom design
● Elaborate frequencies
● Targeting on:
○ Referrers
○ Subscription State
○ Geo
○ Key-Value pairs
● Targeting performed in real-time
○ Without any code changes for publisher

Targeting In Action
1. User visits publisher’s site
2. JS collects data points about visit
3. Request to Front-End Node
4. Collect recent months of user events
5. Send everything to targeting
FEN
News
URL
Env
KV
History
Targeting

User Event Storage
● Pageview, Engage, Subscribe, Consent, etc
● Generating over 2,500 events a second
● Requires fast lookups for targeting
● Long-term storage for case studies and product development
● Aggregate events to build a session
● Chose ClickHouse
○ Long-term storage on HDD
○ Fast lookups with SSD + in-memory cache
○ Materialized views for storing a queue of events

Tech Stack
● GCP (Compute + PubSub + Memorystore)
● Go backend
● Microservice architecture
● 5 regions across 3 continents
● Over 10,000 HTTP requests per second
● Less than 250 VMs

ClickHouse
features that
enable Admiral

MergeTree is the workhorse ClickHouse table
-- Create table
CREATE TABLE mt (
`key` UInt32,
`value` Int32
) ENGINE = MergeTree()
PARTITION BY tuple() ORDER BY key
-- Add data.
INSERT INTO mt VALUES (1, 1);
INSERT INTO mt VALUES (1, -1);

SummingMergeTree is a useful variant
-- Create table with same schema
CREATE TABLE smt AS mt
ENGINE = SummingMergeTree()
ORDER BY key
-- Add data and select
INSERT INTO smt SELECT * FROM mt
-- When you select with FINAL, “zero” rows disappear!
SELECT key, sum(value) FROM smt FINAL GROUP BY key
0 rows in set. Elapsed: 0.001 sec.

Compression and codecs are conﬁgurable
CREATE TABLE test_codecs (
a_lz4 String CODEC(LZ4),
a_zstd String DEFAULT a_lz4 CODEC(ZSTD),
a_lc_lz4 LowCardinality(String) DEFAULT a_lz4 CODEC(LZ4),
a_lc_zstd LowCardinality(String) DEFAULT a_lz4 CODEC(ZSTD)
)
Engine = MergeTree
PARTITION BY tuple() ORDER BY tuple();

Effect on storage size is dramatic
20.84% 12.28%
10.61%
10.65%
7.89%

Materialized views reorganize data for speed
ClickHouse mat views are synchronous
post-insert triggers
Common uses:
● Aggregation
● Automatic reads from Kafka
● Build pipelines using chained views
● Pre-computing last-point queries
● Changing sorting or primary key
○ (Similar to Vertica projections)
cpu
MergeTree
cpu_last_point_agg
SummingMergeTree
cpu_last_point_mv
Materialized View
INSERT
SELECT
(Trigger)
Compressed: ~0.0009%
Uncompressed: ~0.002%

Clusters enable horizontal scaling
Shards
Replicas
Host Host Host
Host
Replicas help with
concurrency
Shards add
IOPs

More table engines to enable clustering
ReplicatedMergeTree
“Umbrella” table that
knows location of
shards and replicas
ReplicatedMergeTree
Distributed
ReplicatedMergeTree
ReplicatedMergeTree
Table that
automatically
propagates
changes to
other replicas
Shard

ClickHouse distributes queries over shards
ontime
_local
ontime
ontime
_local
ontime
ontime
_local
ontime
ontime
_local
ontime
Application
Innermost
subselect is
distributed
AggregateState
computed
locally
Aggregates
merged on
initiator node

Read performance using distributed tables
● Best case performance is linear
with number of nodes
● For fast queries network latency
may dominate parallelization

Tiered storage matches storage to access
Time Series Data
95% of queries
Last day
Last month
Last year
4% of queries
1% of queries
High IOPS
NVMe
SSD
HDD HDD HDD HDD
High Density

Storage conﬁgurations enable tiering
/data1 /data2 /data3
default data2data1Disks
tieredPolicy
slowfastVolumes

CREATE TABLE fast_readings (
sensor_id Int32 Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time)
TTL time + INTERVAL 1 DAY TO VOLUME 'slow',
time + INTERVAL 1 YEAR DELETE
SETTINGS storage_policy = 'tiered'
TTLs control ﬂow in storage
Available in
version 20.1.x

Admiral’s Path
to ClickHouse

First Attempt: Sharded Mongo
● Familiar database
● Sharded Mongo by region
● Flexible data structures
● Large documents
○ Hard to prune old visits
○ Huge indexes (long rebuilds during scaling)
● Primary/secondary/mongos
○ Complicated deployment/updates
○ Vertical Scaling
● Bugs encountered with sharding
○ Shard boundaries
○ Cleanup after shard split
https://jira.mongodb.org/browse/SERVER-38971, https://jira.mongodb.org/browse/SERVER-38969
{
"_id": "alex",
"dc": "gce-us-east1",
"site": "games",
"events": [
{
"time": "10:01",
"type": "visit",
"url": "...",
...
},
{
"time": "10:02",
"type": "engage",
"url": "...",
...
},
...
],
"lastEvent": 10:02"
}

Current: ClickHouse + Redis
● MVs and time-based parts
● Horizontal scaling
○ Rolling updates without downtime
○ Manual intervention adding new replica
● High compression ratio
● 50% of RAM dedicated to uncompressed cache
● 3 ClickHouse servers per region
● Memorystore (Redis) cluster per region
○ Synchronously add into Redis
○ Asynchronously send to Pub/Sub
Per Region

ClickHouse Storage
● Inserts are batched into “events”
○ Spinning HDD for cost
● Materialized Views create 2 other rows
○ Pending user count
○ Smaller SSD events table
● Fast reads from SSD for targeting
● Future: TTL-based tiered storage
Time User Site Type URL ...
10:01 Alex Games Visit ... ...
10:02 Alex Games Engage ... ...
10:25 Marie News Visit ... ...
Hour User Site Pending
10:00 Joe News 1
10:00 Alex Games 5
10:00 Marie News 4

Performance
● CH: 95th percentile <20ms, 50th percentile 7ms
○ Goal was 100ms for 95th percentile
○ Decreased 95th percentile compared to MongoDB
● Redis: 95th percentile <12ms, 50th percentile 3ms
● Over 1,000 CH queries/sec globally
○ >400 queries/sec in busiest region
● ~50% of queries hit ClickHouse
○ Tail in Redis to know if full history cached
○ 85%-90% uncompressed cache hit rate

Compression
● Global ZSTD Level 1
○ Optimized for speed
○ Future: Per-column compression levels
● LowCardinality type
○ Dictionary with stored positions
○ Country
○ Site
○ Engagement ID
○ 99%+ compression
SELECT
name, type,
1 - (data_compressed_bytes /
data_uncompressed_bytes)
FROM system.columns
WHERE table = ?
ORDER BY data_uncompressed_bytes DESC
user_agent String 0.95936
url String 0.65275
user UUID 0.75514
type UInt8 0.98113

User Sessions
● Session deﬁned by no activity within 30 minutes
○ Or midnight
● Materialized view into SummingMergeTree
○ Sums value for same primary key
○ Deletes rows with 0 value
● Every hour fetch events for the user
○ Decide if session ended
○ Insert negative value
○ After merge row is removed
10:00 Alex Games -5
10:00 Joe News 1
10:00 Alex Games 5
10:00 Marie News 4
10:00 Joe News 1
10:00 Marie News 4

Queue in ClickHouse
SELECT groupArray(partition)
FROM system.parts
WHERE active AND database = ? AND table = ? AND rows > 0
SELECT
hour, user, site, sum(pending) as pending
FROM pending_users
PREWHERE hour = ?
GROUP BY hour, user, site
HAVING pending > 0
ORDER BY (user, site);

Queue in ClickHouse
Each hour the number of table parts decreases as rows are removed and merged. At
midnight all sessions expire and the number of parts drops dramatically.

Future ClickHouse Usage
● Public/Internal Alerts
○ Signed up, enabled feature, etc to Slack
○ Popular article (realtime optimizations)
● Publisher Analytics
○ Currently storing >100TB in Bigtable
○ Remove hourly aggregation into Mongo
○ SQL instead of custom query language
● Audit Logging

Takeaways
● Multiple beneﬁts with switching to ClickHouse:
○ Expanded storage capacity
○ Increased scaling and performance
○ Reduced complexity in deployments
● First non-MongoDB datastore
● Shoutout to Altinity
○ Customized Training
○ POC assistance (design and schema optimization)
● Expanding to new projects

Thank you!
Admiral:
https://www.getadmiral.com
Altinity:
https://www.altinity.com

Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships for publishers

More Related Content

What's hot

Similar to Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships for publishers

More from Altinity Ltd

Recently uploaded

Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships for publishers