Aerospike Modeling
User Segmentation with Maps and Bitfields
2 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
3 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Cassandra databases, including derivatives such as ScyllaDB, have a
needle in a haystack problem
▪ In C* each user ID – segment ID pair is in its own row
▪ This affects performance when you need low latency key-value operations
▪ In Aerospike we keep all the segments together in a single record
tl;dr
4 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ In digital advertising user profiles stores assist with audience segmentation
▪ The goal is to pull user segments for a specific user as fast as possible
▪ Modeling this use case is generally applicable to other forms of online
personalization
User Profile Stores
5 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
CREATE TABLE userspace.user_segments (
user_id uuid,
segment_id int,
attr smallint,
attr2 smallint,
PRIMARY KEY ((user_id, segment_id), user_id)
)
▪ On average 1000 segments per profile
▪ 50 billion cookies means 50 trillion rows
▪ Large latency to find 1000 segments of a user from a huge number of rows
Modeling in Cassandra
6 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
{segmentID: [segment-TTL, {attr1, attr2}]}
{ 8457: [8889*, {}],
12845: [8889, {}],
42199: [8889, {}],
43696: [8889, {}],
}
▪ * Segment TTL uses local epoch (hours since epoch)
▪ The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED
▪ Choosing K-ORDERED gives the best performance for data on SSD
Modeling in Aerospike
7 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can easily upsert into the map new user segments as they are
processed (https://github.com/aerospike-examples/modeling-user-segmentation)
Advantages
8 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use get_by_value_interval to filter segments that have a
specific ‘freshness’
Advantages
9 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store. Just get the record.
Advantages
10 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments, called as a background scan operation (>= 4.7)
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store
Advantages
11 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List operations supported by the server. Method names in the clients might be different.
• General Write Flags: (create_only, update_only, no_fail, partial)
• resize()
• insert(), remove(), set()
• or(), and(), xor(), not()
• lshift(), rshift()
• add(), subtract(), set-integer()
• get(), count()
• lscan(), rscan()
• get-integer()
Bitwise Operations
12 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Represent the segments as a continuous bitfield
▪ Each integer is a bit position. Set the bit for a segment the user is in
▪ Bitwise operations to check server-side if user is in multiple segments
▪ Compresses extremely well in Enterprise Edition
▪ Caveat: can't apply a TTL to the segments
Modeling with Bitfields
13 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List & Map API
▪ https://www.aerospike.com/docs/guide/cdt-list.html
▪ https://www.aerospike.com/docs/guide/cdt-map.html
▪ https://www.aerospike.com/docs/guide/cdt-context.html
▪ https://www.aerospike.com/docs/guide/cdt-ordering.html
▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html
▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html
Code Samples
▪ https://github.com/aerospike-examples/modeling-user-segmentation
Aerospike Training
▪ https://www.aerospike.com/training/
More material you can explore:
Thank You!
Any questions?
ronen@aerospike.com

Aerospike Data Modeling - Meetup Dec 2019

  • 1.
    Aerospike Modeling User Segmentationwith Maps and Bitfields
  • 2.
    2 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc.
  • 3.
    3 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ Cassandra databases, including derivatives such as ScyllaDB, have a needle in a haystack problem ▪ In C* each user ID – segment ID pair is in its own row ▪ This affects performance when you need low latency key-value operations ▪ In Aerospike we keep all the segments together in a single record tl;dr
  • 4.
    4 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ In digital advertising user profiles stores assist with audience segmentation ▪ The goal is to pull user segments for a specific user as fast as possible ▪ Modeling this use case is generally applicable to other forms of online personalization User Profile Stores
  • 5.
    5 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. CREATE TABLE userspace.user_segments ( user_id uuid, segment_id int, attr smallint, attr2 smallint, PRIMARY KEY ((user_id, segment_id), user_id) ) ▪ On average 1000 segments per profile ▪ 50 billion cookies means 50 trillion rows ▪ Large latency to find 1000 segments of a user from a huge number of rows Modeling in Cassandra
  • 6.
    6 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. {segmentID: [segment-TTL, {attr1, attr2}]} { 8457: [8889*, {}], 12845: [8889, {}], 42199: [8889, {}], 43696: [8889, {}], } ▪ * Segment TTL uses local epoch (hours since epoch) ▪ The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED ▪ Choosing K-ORDERED gives the best performance for data on SSD Modeling in Aerospike
  • 7.
    7 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can easily upsert into the map new user segments as they are processed (https://github.com/aerospike-examples/modeling-user-segmentation) Advantages
  • 8.
    8 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use get_by_value_interval to filter segments that have a specific ‘freshness’ Advantages
  • 9.
    9 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use the map remove_by_value_interval operation to trim expired segments ▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s segments from the user profile store. Just get the record. Advantages
  • 10.
    10 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use the map remove_by_value_interval operation to trim expired segments, called as a background scan operation (>= 4.7) ▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s segments from the user profile store Advantages
  • 11.
    11 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. List operations supported by the server. Method names in the clients might be different. • General Write Flags: (create_only, update_only, no_fail, partial) • resize() • insert(), remove(), set() • or(), and(), xor(), not() • lshift(), rshift() • add(), subtract(), set-integer() • get(), count() • lscan(), rscan() • get-integer() Bitwise Operations
  • 12.
    12 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ Represent the segments as a continuous bitfield ▪ Each integer is a bit position. Set the bit for a segment the user is in ▪ Bitwise operations to check server-side if user is in multiple segments ▪ Compresses extremely well in Enterprise Edition ▪ Caveat: can't apply a TTL to the segments Modeling with Bitfields
  • 13.
    13 Proprietary &Confidential | All rights reserved. © 2019 Aerospike Inc. List & Map API ▪ https://www.aerospike.com/docs/guide/cdt-list.html ▪ https://www.aerospike.com/docs/guide/cdt-map.html ▪ https://www.aerospike.com/docs/guide/cdt-context.html ▪ https://www.aerospike.com/docs/guide/cdt-ordering.html ▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html ▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html Code Samples ▪ https://github.com/aerospike-examples/modeling-user-segmentation Aerospike Training ▪ https://www.aerospike.com/training/ More material you can explore:
  • 14.