Aerospike Nested CDTs - Meetup Dec 2019

Exploring Data Modeling
Modeling with Nested CDTs
Zohar Elkayam, Aerospike

2 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ A Quick Overview of Aerospike Data Types.
▪ Recap: CDTs: The List and Map APIs.
▪ Modeling with Nested CDTs.
▪ A real-life Example.
Agenda

Aerospike is a Primary Key Database
Objects stored in Aerospike are called records.
A bin holds the value of a supported data type: integer, double, string, bytes, list, map,
geospatial.
Every record is uniquely identified by the 3-tuple (namespace, set, primary-key).
A record contains one or more bins.
(namespace, set, primary-key)
EXP – Expiration Timestamp
LUT – Last Update Time
GEN – Generation
RECORD
EXP LUT GEN BIN1 BIN2

▪ Aerospike is a row-oriented distributed database.
▪ Rows (records) contain one or more columns (bins).
▪ Similar to an RDBMS with primary-key table lookups.
▪ Single record transactions.
Aerospike Concepts
Aerospike RDBMS
Namespace Tablespace or Database
Set Table
Record Row
Bin Column
Bin type
Integer
Double
String
Bytes
List (Unordered, Ordered)
Map (Unordered,
K-Ordered, KV-Ordered)
GeoJSON

▪ Aerospike data types have powerful APIs.
▪ Atomic operations simplify application logic and reduce network load.
▪ Complex Data Types (CDT – such as List and Map) can be nested to many levels.
▪ Before 4.6: Atomic operations can execute only at the top level of a List or Map.
▪ After 4.6: Atomic operations can execute at all levels of a nested CDT.
▪ Native operations perform and scale better than UDFs.
Data Modeling in Aerospike

Example of a Record
('test', 'users’, 37) ➔ {
'name': 'Apolonius Kensington',
'age': 31,
'logins_ts': [ 1576077683, 1576224612, 1576350640 ],
'cards': [
{ 'type': 'visa’,
'last4': 5996,
'expires': '2019-09',
'usage_cnt': 12},
{
'type': 'mastercard',
'last4': 4177,
'expires': '2018-11',
'usage_cnt': 1}
],
'pass': p??Ȇ??"R*?vw96Q
}
A simple list bin
A map
Nested CDT Bin: a
list of maps

List operations supported by the server. Method names in the clients might be different.
• set_type() (unordered, ordered)
• sort(), clear(), size()
• append(), append_items(), insert(), insert_items(), set(), increment()
• get_by_index(), get_by_index_range(), get_by_rank(), get_by_rank_range(),
get_by_value(), get_by_value_interval(), get_all_by_value(),
get_all_by_value_list(), get_by_value_rel_rank_range()
• remove_by_index(), remove_by_index_range(), remove_by_rank(),
remove_by_rank_range(), remove_by_value(), remove_by_value_interval(),
remove_all_by_value(), remove_all_by_value_list(),
remove_by_value_rel_rank_range()
List Operations

Map operations supported by the server. Method names in the clients might be different.
• set_type() (unordered, k-ordered or kv-ordered)
• size(), clear()
• add(), add_items(), increment(), decrement()
• get_by_key(), get_by_index(), get_by_rank(), get_by_key_interval(),
get_by_index_range(), get_by_value_interval(), get_by_rank_range(),
get_all_by_value(), get_by_key_rel_index_range(), get_by_value_rel_rank_range(),
get_all_by_key_list(), get_all_by_value_list()
• remove_by_key(), remove_by_index(), remove_by_rank(), remove_by_key_interval(),
remove_by_index_range(), remove_by_value_interval(), remove_by_rank_range(),
remove_all_by_value(), remove_all_by_key_list(), remove_all_by_value_list(),
remove_by_key_rel_index_range(), remove_by_value_rel_rank_range()
Map Operations

▪ Version 4.6 added the ability to perform list and map operations on nested
objects.
▪ For example, we will be able to manipulate lists that are the value of a map items
(2nd level) the same way we manipulate the map items themselves (1st level).
▪ The new CDT introduced some general performance improvements around
comparing lists and maps.
▪ This is currently supported on some of the clients: C, C#, Java, Node.js,
Python, Go.
New CDT (Map and List) Operations

▪ Nested CDT operations are executed on the internal level.
Example:
▪ Adding to a list inside a map would use List Operations.
▪ Incrementing a value a value of a map inside a list would use Map Operations.
▪ But how do we specify the actual location? Use the CTX context object.
▪ The CTX object will indicate where to find the object/value to operate on.
▪ We can nest the CTX to get into a deeper level.
▪ https://www.aerospike.com/docs/guide/cdt-context.html
Getting into the Nested: CTX
BY_LIST_INDEX
BY_LIST_RANK
BY_LIST_VALUE
BY_MAP_INDEX
BY_MAP_RANK
BY_MAP_KEY
BY_MAP_VALUE

bin = [0, 1, [2, [3, 4], 5, 6], 7, [8, 9] ]
Sub level 2 (1 CTX object):
> subcontext_eval([BY_LIST_INDEX, -1], list.append(99) )
bin = [0, 1, [2, [3, 4], 5, 6], 7, [8, 9, 99] ]
Sub level 3 (2 CTX objects, read left to right):
> subcontext_eval([BY_LIST_INDEX, 2, BY_LIST_INDEX, 1], list.append(44))
bin = [0, 1, [2, [3, 4, 44], 5, 6], 7, [8, 9, 99] ]
Top Level (no CTX object):
> list.append(10)
bin = [0, 1, [2, [3, 4, 44], 5, 6], 7, [8, 9, 99], 10 ]
CTX Examples

▪ The customer is getting tuples of game level and timestamp when they were
sampled for each player.
▪ Those levels and timestamps are not received in order.
▪ The customer would like to keep only the earliest timestamp observed for
each level.
▪ The customer also would like to keep only the last n levels and remove the
rest.
▪ The customer would also like to query the list by timestamp and get the
level the player was at that timestamp (equal or observed before that ts).
Nested CDT Example (a real-life example!)

▪ A possible data model to solve the problem: keep a map where the level is
the key and the timestamp is the value:
{
7: 700,
8: 705,
9: 720,
10: 740
}
▪ But how do we handle the out of order records? How do we query?
▪ We will need to do that in the app – read the record, recreate the entire map and
rewrite the record (assuming no one tried to change the record while we were
modifying it in the app: use CAS to insure that!).
The Data Model: Original

▪ The suggested data model: keep a map where the level is the key and a list
of timestamps is the value.
{
7: [700],
8: [705],
9: [720],
10: [740]
}
▪ We will append and new timestamp to the list and trim it to keep the oldest
one. We will also trim the entire map based on the keys – all of this will be
done incrementally and in a single operation.
The Data Model: Using Nested CDT and API

for (List p: data) {
console.info("Working on " + p.get(0) + " : " + p.get(1));
int myLevel = Integer.parseInt(p.get(0).toString());
List<Value> l1 = new ArrayList<Value>();
Map<Value,Value> m1 = new HashMap<Value,Value>();
m1.put(Value.get(myLevel), Value.get(l1));
// map policy to add a level only if it does not already exist, set the map to be sorted by the key
MapPolicy mapPolicy = new MapPolicy(MapOrder.KEY_ORDERED, (MapWriteFlags.CREATE_ONLY | MapWriteFlags.NO_FAIL));
// Operate to run multiple commands on the same key
Record record = client.operate(params.writePolicy, key,
// add the level with empty initial ts if it does not exist, skip if it does
MapOperation.putItems(mapPolicy, binName, m1),
// add the new timestamps to the level
ListOperation.append(binName, Value.get(p.get(1)), CTX.mapKey(Value.get(myLevel))),
// keep the oldest timestamp observed, remove the rest
ListOperation.removeByRankRange(binName, 0, 1, ListReturnType.INVERTED, CTX.mapKey(Value.get(myName))),
// keep only the mostRecent (4) levels sorted by timestamp
MapOperation.removeByRankRange(binName, negMostRecent, mostRecent, MapReturnType.INVERTED)
);
Code Sample

Nested Example: Output
2019-12-14 22:10:05 IST INFO Add node BB9040011AC4202 172.17.0.4 3000
2019-12-14 22:10:05 IST INFO Working on 9 : 700
2019-12-14 22:10:05 IST INFO Record: (gen:1),(exp:316642206),(bins:(levels:{9=[700]}))
2019-12-14 22:10:05 IST INFO Record: (gen:2),(exp:316642206),(bins:(levels:{9=[700], 11=[710]}))
2019-12-14 22:10:05 IST INFO Record: (gen:5),(exp:316642206),(bins:(levels:{9=[700], 11=[705], 12=[730]}))
2019-12-14 22:10:05 IST INFO Record: (gen:6),(exp:316642206),(bins:(levels:{9=[700], 11=[705], 12=[730], 13=[740]}))

Code Sample: The Read API
2019-12-15 07:20:14 IST INFO Record: (gen:1),(exp:316675215),(bins:(levels:{9=[700]}))
2019-12-15 07:20:14 IST INFO Looking for [731] got: [9=[700]]
2019-12-15 07:20:14 IST INFO Record: (gen:5),(exp:316675215),(bins:(levels:{9=[700], 11=[705], 12=[730]}))
// PART 2: the Read API
// set the timestamp to search since timestamp is in a list
List<Value> ltmp = new ArrayList<Value>();
ltmp.add(Value.get(731));
// Find the item which was before our timestamp
Record rec2 = client.operate(params.writePolicy, key,
MapOperation.getByValueRelativeRankRange(binName, Value.get(ltmp), -1, 1, MapReturnType.KEY_VALUE));

▪ The customer came back asking to change the data model – he would need
to keep more metadata on each level.
▪ So the data model changed, and it would now be a map containing a list of
lists:
{
7: [[700, 501]],
8: [[705, 508]],
9: [[720, 550]],
10: [[740, 600]]
}
Nested CDT Example – More Complex!

for (List p: data) {
console.info("Working on " + p.get(0) + " : " + p.get(1) + " : " + p.get(2));
int myName = Integer.parseInt(p.get(0).toString());
List<Value> ts = new ArrayList<Value>();
List<Value> l1 = new ArrayList<Value>();
// Create a touple for timestamp observer, and insert time(?)
ts.add(Value.get(p.get(1)));
ts.add(Value.get(p.get(2)));
// Create a list of lists element
l1.add(Value.get(ts));
// Create a map element of {level:[[ts, insert_date]]}
Map<Value,Value> m1 = new HashMap<Value,Value>();
m1.put(Value.get(myName), Value.get(l1));
// map policy to add a level only if it does not already exist, set the map to be sorted by the key
MapPolicy mapPolicy = new MapPolicy(MapOrder.KEY_ORDERED, (MapWriteFlags.CREATE_ONLY | MapWriteFlags.NO_FAIL));
// Operate to run multiple commands on the same key
Record record = client.operate(params.writePolicy, key,
// add the level with empty init ts if it does not exist, ignore if it does
MapOperation.putItems(mapPolicy, binName, m1),
// add the new timestamp to the level (so now we have 2 candidates)
ListOperation.append(binName, Value.get(ts), CTX.mapKey(Value.get(myName))),
// remove the redundant candidate and keep the oldest timestamp observed
ListOperation.removeByRankRange(binName, 0, 1, ListReturnType.INVERTED, CTX.mapKey(Value.get(myName))),
// keep only the mostRecent (4) levels sorted by timestamp
MapOperation.removeByRankRange(binName, negMostRecent, mostRecent, MapReturnType.INVERTED)
);
Code Sample – 3 level nest

Code Sample: The Read API
// PART 2: the Read API
// set the timestamp to search since timestamp is in a list of lists since this is how we store it on the record
// The ts we look the date for is *731*
int value_to_look_for = 731;
List<Value> ltmp2 = new ArrayList<Value>();
ltmp2.add(Value.get(value_to_look_for));
ltmp2.add(Value.INFINITY);
List<Value> ltmp = new ArrayList<Value>();
ltmp.add(Value.get(ltmp2));
// Find the item which was before our timestamp, INF
Record rec2 = client.operate(params.writePolicy, key,
MapOperation.getByValueRelativeRankRange(binName, Value.get(ltmp), -1, 1, MapReturnType.KEY_VALUE));
console.info("Looking for " + ltmp + " got: " + rec2.getValue(binName));

Nested Example: Output
2019-12-14 22:21:02 IST INFO Working on 9 : 700 : 500
2019-12-14 22:21:02 IST INFO ##Record: (gen:1),(exp:316642862),(bins:(levels:{9=[[700, 500]]}))
2019-12-14 22:21:02 IST INFO Looking for [[731, INF]] got: [9=[[700, 500]]]
2019-12-14 22:21:02 IST INFO ##Record: (gen:2),(exp:316642862),(bins:(levels:{9=[[700, 500]], 11=[[710, 550]]}))
2019-12-14 22:21:02 IST INFO ##Record: (gen:5),(exp:316642863),(bins:(levels:{9=[[700, 500]], 11=[[705, 540]], 12=[[730, 570]]}))
2019-12-14 22:21:02 IST INFO ##Record: (gen:6),(exp:316642863),(bins:(levels:{9=[[700, 500]], 11=[[705, 540]], 12=[[730, 570]], 13=[[740, 590]]}))

▪ We talked about Aerospike’s data types, and how they can be used for modeling.
▪ We looked more closely at the nestd CDT APIs.
▪ Nested CDT are a powerful tool – use it wisely.
What next?
▪ If you haven’t seen it, take a look at the slides from the first Israeli ASUG meetups.
▪ Go to GitHub; clone the code samples repo; run it; read the code.
▪ Read the Aerospike blog. Get familiar with all the database features.
▪ Participate in the community forum (https://discuss.aerospike.com), StackOverflow’s
aerospike tag.
Summary

List & Map API
▪ https://www.aerospike.com/docs/guide/cdt-list.html
▪ https://www.aerospike.com/docs/guide/cdt-ordering.html
▪ https://www.aerospike.com/docs/guide/cdt-context.html
▪ https://www.aerospike.com/docs/guide/cdt-list-ops.html
▪ https://www.aerospike.com/docs/guide/cdt-map.html
▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html
▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html
Code Samples
▪ https://github.com/aerospike-examples/
Aerospike Acadamy
▪ http://acadamy.aerospike.com/
More material you can explore:

Thank You!
Any questions?
zelkayam@aerospike.com

Aerospike Nested CDTs - Meetup Dec 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Aerospike Nested CDTs - Meetup Dec 2019

Similar to Aerospike Nested CDTs - Meetup Dec 2019 (20)

More from Aerospike

More from Aerospike (10)

Recently uploaded

Recently uploaded (20)

Aerospike Nested CDTs - Meetup Dec 2019