SlideShare a Scribd company logo
1 of 14
Download to read offline
Aerospike Modeling
User Segmentation with Maps and Bitfields
2 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
3 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Cassandra databases, including derivatives such as ScyllaDB, have a
needle in a haystack problem
▪ In C* each user ID – segment ID pair is in its own row
▪ This affects performance when you need low latency key-value operations
▪ In Aerospike we keep all the segments together in a single record
tl;dr
4 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ In digital advertising user profiles stores assist with audience segmentation
▪ The goal is to pull user segments for a specific user as fast as possible
▪ Modeling this use case is generally applicable to other forms of online
personalization
User Profile Stores
5 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
CREATE TABLE userspace.user_segments (
user_id uuid,
segment_id int,
attr smallint,
attr2 smallint,
PRIMARY KEY ((user_id, segment_id), user_id)
)
▪ On average 1000 segments per profile
▪ 50 billion cookies means 50 trillion rows
▪ Large latency to find 1000 segments of a user from a huge number of rows
Modeling in Cassandra
6 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
{segmentID: [segment-TTL, {attr1, attr2}]}
{ 8457: [8889*, {}],
12845: [8889, {}],
42199: [8889, {}],
43696: [8889, {}],
}
▪ * Segment TTL uses local epoch (hours since epoch)
▪ The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED
▪ Choosing K-ORDERED gives the best performance for data on SSD
Modeling in Aerospike
7 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can easily upsert into the map new user segments as they are
processed (https://github.com/aerospike-examples/modeling-user-segmentation)
Advantages
8 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use get_by_value_interval to filter segments that have a
specific ‘freshness’
Advantages
9 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store. Just get the record.
Advantages
10 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments, called as a background scan operation (>= 4.7)
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store
Advantages
11 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List operations supported by the server. Method names in the clients might be different.
• General Write Flags: (create_only, update_only, no_fail, partial)
• resize()
• insert(), remove(), set()
• or(), and(), xor(), not()
• lshift(), rshift()
• add(), subtract(), set-integer()
• get(), count()
• lscan(), rscan()
• get-integer()
Bitwise Operations
12 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Represent the segments as a continuous bitfield
▪ Each integer is a bit position. Set the bit for a segment the user is in
▪ Bitwise operations to check server-side if user is in multiple segments
▪ Compresses extremely well in Enterprise Edition
▪ Caveat: can't apply a TTL to the segments
Modeling with Bitfields
13 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List & Map API
▪ https://www.aerospike.com/docs/guide/cdt-list.html
▪ https://www.aerospike.com/docs/guide/cdt-map.html
▪ https://www.aerospike.com/docs/guide/cdt-context.html
▪ https://www.aerospike.com/docs/guide/cdt-ordering.html
▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html
▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html
Code Samples
▪ https://github.com/aerospike-examples/modeling-user-segmentation
Aerospike Training
▪ https://www.aerospike.com/training/
More material you can explore:
Thank You!
Any questions?
ronen@aerospike.com

More Related Content

Similar to Aerospike Data Modeling - Meetup Dec 2019

Similar to Aerospike Data Modeling - Meetup Dec 2019 (20)

Externalized Distributed Configuration Management with Spring Cloud Config-Se...
Externalized Distributed Configuration Management with Spring Cloud Config-Se...Externalized Distributed Configuration Management with Spring Cloud Config-Se...
Externalized Distributed Configuration Management with Spring Cloud Config-Se...
 
Application Modernization using the Strangler Pattern
Application Modernization using the Strangler PatternApplication Modernization using the Strangler Pattern
Application Modernization using the Strangler Pattern
 
IDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data EnvironmentsIDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data Environments
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Less13 performance
Less13 performanceLess13 performance
Less13 performance
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance Tuning
 
Performance Schema and Sys Schema in MySQL 5.7
Performance Schema and Sys Schema in MySQL 5.7Performance Schema and Sys Schema in MySQL 5.7
Performance Schema and Sys Schema in MySQL 5.7
 
MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014) MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014)
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
 
In Mind Cloud - Product Release - 1904
In Mind Cloud - Product Release - 1904In Mind Cloud - Product Release - 1904
In Mind Cloud - Product Release - 1904
 
Self Driving Storage
Self Driving StorageSelf Driving Storage
Self Driving Storage
 
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetMXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNet
 
Oracle 12.2 - My Favorite Top 5 New or Improved Features
Oracle 12.2 - My Favorite Top 5 New or Improved FeaturesOracle 12.2 - My Favorite Top 5 New or Improved Features
Oracle 12.2 - My Favorite Top 5 New or Improved Features
 
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
 
IBM Spectrum Protect and IBM Spectrum Protect Plus - What's new! June '18
IBM Spectrum Protect and IBM Spectrum Protect Plus - What's new! June '18IBM Spectrum Protect and IBM Spectrum Protect Plus - What's new! June '18
IBM Spectrum Protect and IBM Spectrum Protect Plus - What's new! June '18
 
MySQL 8.0 - Security Features
MySQL 8.0 - Security FeaturesMySQL 8.0 - Security Features
MySQL 8.0 - Security Features
 

More from Aerospike

More from Aerospike (10)

Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad LeevAerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
 
Handling Increasing Load and Reducing Costs Using Aerospike NoSQL Database - ...
Handling Increasing Load and Reducing Costs Using Aerospike NoSQL Database - ...Handling Increasing Load and Reducing Costs Using Aerospike NoSQL Database - ...
Handling Increasing Load and Reducing Costs Using Aerospike NoSQL Database - ...
 
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron HoffmanContentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
 
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
 
Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04...
Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04...Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04...
Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04...
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
 
Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 

Aerospike Data Modeling - Meetup Dec 2019

  • 1. Aerospike Modeling User Segmentation with Maps and Bitfields
  • 2. 2 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
  • 3. 3 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ Cassandra databases, including derivatives such as ScyllaDB, have a needle in a haystack problem ▪ In C* each user ID – segment ID pair is in its own row ▪ This affects performance when you need low latency key-value operations ▪ In Aerospike we keep all the segments together in a single record tl;dr
  • 4. 4 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ In digital advertising user profiles stores assist with audience segmentation ▪ The goal is to pull user segments for a specific user as fast as possible ▪ Modeling this use case is generally applicable to other forms of online personalization User Profile Stores
  • 5. 5 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. CREATE TABLE userspace.user_segments ( user_id uuid, segment_id int, attr smallint, attr2 smallint, PRIMARY KEY ((user_id, segment_id), user_id) ) ▪ On average 1000 segments per profile ▪ 50 billion cookies means 50 trillion rows ▪ Large latency to find 1000 segments of a user from a huge number of rows Modeling in Cassandra
  • 6. 6 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. {segmentID: [segment-TTL, {attr1, attr2}]} { 8457: [8889*, {}], 12845: [8889, {}], 42199: [8889, {}], 43696: [8889, {}], } ▪ * Segment TTL uses local epoch (hours since epoch) ▪ The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED ▪ Choosing K-ORDERED gives the best performance for data on SSD Modeling in Aerospike
  • 7. 7 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can easily upsert into the map new user segments as they are processed (https://github.com/aerospike-examples/modeling-user-segmentation) Advantages
  • 8. 8 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use get_by_value_interval to filter segments that have a specific ‘freshness’ Advantages
  • 9. 9 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use the map remove_by_value_interval operation to trim expired segments ▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s segments from the user profile store. Just get the record. Advantages
  • 10. 10 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ We can use the map remove_by_value_interval operation to trim expired segments, called as a background scan operation (>= 4.7) ▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s segments from the user profile store Advantages
  • 11. 11 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. List operations supported by the server. Method names in the clients might be different. • General Write Flags: (create_only, update_only, no_fail, partial) • resize() • insert(), remove(), set() • or(), and(), xor(), not() • lshift(), rshift() • add(), subtract(), set-integer() • get(), count() • lscan(), rscan() • get-integer() Bitwise Operations
  • 12. 12 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. ▪ Represent the segments as a continuous bitfield ▪ Each integer is a bit position. Set the bit for a segment the user is in ▪ Bitwise operations to check server-side if user is in multiple segments ▪ Compresses extremely well in Enterprise Edition ▪ Caveat: can't apply a TTL to the segments Modeling with Bitfields
  • 13. 13 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc. List & Map API ▪ https://www.aerospike.com/docs/guide/cdt-list.html ▪ https://www.aerospike.com/docs/guide/cdt-map.html ▪ https://www.aerospike.com/docs/guide/cdt-context.html ▪ https://www.aerospike.com/docs/guide/cdt-ordering.html ▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html ▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html Code Samples ▪ https://github.com/aerospike-examples/modeling-user-segmentation Aerospike Training ▪ https://www.aerospike.com/training/ More material you can explore: