Persistent Data Structures - partial::Conf

PERSISTENT DATA
STRUCTURES
@IvanVergiliev

ABOUT ME
• Currently - Backend @ Heap
• Real-time music recommendations @ SoundCloud
• Google, Facebook, Leanplum, Chaos Group
• Bronze medal @ IOI ’09

INTERNATIONAL OLYMPIAD IN
INFORMATICS
What do IOI people have in common?

IOI PEOPLE CARE ABOUT PERF
And also, algorithms and data structures
… and lots of other things

– Functional Programming @ Universities
“Oh, just try to understand the concept.”

ENTER
PERSISTENT DATA
STRUCTURES

WHAT’S NEXT
1. Types of Persistent Data Structures
2. Applications
3. Implementation and performance considerations

TYPES OF PERSISTENT DATA
STRUCTURES

EPHEMERAL DATA
STRUCTURES
Or, what most programmers call simply
“data structures”

PARTIALLY
PERSISTENT
DS
You can look,
but you can’t edit

FULLY
PERSISTENT
DS
Now with branching!

CONFLUENTLY PERSISTENT DS
DAG = Directed Acyclic Graph

Now with merge support!

TRANSACTION ROLLBACK
void addCustomer(List<int> customerIds, int id) {
try (transaction.start()) {
customerIds.add(id);
transaction.commit()
} catch (RollbackException e) {
// ???
}
}

void addCustomer(List<int> customerIds, int id) {
customerIds.add(id);
customerIds.pop_back(); // what if we had 10
objects?
}
}

List<int> addCustomer(List<int> customerIds, int id)
{
val newCustomersIds = customerIds.add(id);
return newCustomersIds;
return customerIds;
}
}
(Of course, that’s kinda obvious at a functional
programming conference…)

OPTIMISTIC UI
Optimizing user perception instead of
performance
Image source: https://uxplanet.org/optimistic-1000-34d9eefe4c05

OPTIMISTIC UI
• People hate waiting
• API calls almost always succeed
• => just pretend they succeeded and update UI
• However, failures do occur sometimes

HANDLING FAILURE WITH
FULLY PERSISTENT DS
val speculativeState = state
.update('list1', state.get('list2').get(0))
.remove('list2', 0)
display(speculativeState);
api
.saveState(speculativeState)
.onError(=> display(state));

WHAT IF THE SERVER KNOWS
SOMETHING WE DON’T?
Say, someone shot us in Quake

val speculativeState = state
.update('list1', state.get('list2').get(0))
.remove('list2', 0)
display(speculativeState);
api
.saveState(speculativeState)
.onSuccess((serverState) =>
newState = merge(
serverState,
speculativeState));
.onError(=> display(state));

POSTGRES MVCC
Speaking about transactions…

POSTGRES MVCC
• MVCC ~~= when modifying a row, create a new
version instead of modifying in-place
• Arrays, etc. need to be copied every single time

GEOMETRIC PROBLEMS
Time is just another dimension

POINT LOCATION
Or, how your browser determines where you
clicked

POINT LOCATION
Split vertically at intersections

POINT LOCATION
Each vertical split is ordered

POINT LOCATION -
ATTACK PLAN
1. Split map into vertical
stripes at intersections
2. Create a BST for each
stripe
3. Do a binary search to
find the stripe
4. Locate the region in the
BST

HOWEVER, N BINARY TREES =
O(N^2) MEMORY

DELTA BETWEEN TREES IS
LIMITED

USE PERSISTENT BINARY
SEARCH TREES

IMPLEMENTATION AND
PERFORMANCE
CONSIDERATIONS

YOU REALLY WANT GARBAGE
COLLECTION
• Otherwise…
• reference counting (e.g. C++ shared_ptr<T>)
• now, each node is mutable
• we need synchronization
• destruction can be very slow and cause stack
overflows

WHY 32 SLOTS PER NODE?
Source: http://hypirion.com/

WHY ARE BINARY SEARCH
TREES BINARY?
• Why not 1000-way search trees?
• logN N = 1, so why not?
• Because we need to be able to search and insert
• Complexities are not really O(logk N)
• Search: O(log k * logk N)
• Insert: O(k * logk N)

1000-WAY SEARCH TREE
• Each node would need to support efficient insertion
of elements and search
• Which sounds exactly like a binary search tree…

WHAT’S WRONG WITH BINARY?
• Node(Node* left, int value, Node* right)
• how much memory does the CPU read?
• >= 1 cache line
• usually 64 bytes
• even more if you read from disk

SO… HOW ABOUT
BITMAPPED VECTOR TRIES?

BITMAPPED VECTOR TRIE
COMPLEXITIES
• We don’t really need to search - we know the exact
index we care about
• Lookup - O(logk N)
• Let’s set k = N!!!
• * this vector trie would not be persistent though

PERSISTENT VECTOR
COMPLEXITIES
• Lookup - O(logk N)
• constant work per node
• Update - O(k * logk N)
• need to copy k elements for each node

LOOKUP VS UPDATE SPEED
• Lookup gets faster with
larger k
• Updates are more
complicated

TAIL OPTIMIZATION
• What do people usually do with vectors?
• They append to them
• Let’s optimize for that

TAIL OPTIMIZATION
Keep the tail separate

SCALA’S FOCUS
• Like a tail, but for any index
• Optimizes local operations

TRANSIENTS
• When it’s all still too slow
• Prevents the data structure from being persistent for
a while

EXTENSIBILITY
7 4 5 2 111 92 173 44 35 13

Persistent Data Structures - partial::Conf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Persistent Data Structures - partial::Conf

Similar to Persistent Data Structures - partial::Conf (20)

Recently uploaded

Recently uploaded (20)

Persistent Data Structures - partial::Conf

Editor's Notes