Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudAlexander G. ConnorPanos K. ChrysanthisAlexandrosLabrinidisAdvanced Data Management Technologies LaboratoryDepartment of Computer ScienceUniversity of Pittsburgh

Data in social networksA social network manages user profiles, updates and connectionsHow to manage this data in a scalable way?Key-value stores offer performance under high loadSome observations about social networksA profile view usually includes data from a user’s friendsSpatial localityA friend’s profile is often visited nextTemporal localityRequests might ask for updates from several usersWeb pages might include pieces of several user profilesA single request requires connecting to many machines

Connections in a Social NetworkAlice

Leveraging LocalityCan we take advantage of the connections?What if we stored connected user’s profiles and data in the same place?Locality can be leveraged The number of connections is reducedUser data can be pre-fetchedWe can think of this as a graph partitioning problem…Partitions = machinesVertices = user profiles, including updateEdges = connectionsObjective: minimize the number of edges that cross partitions

Example – graph partitioningMany edges cross partitions

Accessing a vertex’s neighbors requires accessing many partitions

In a social network, requesting updates from followed users requires connecting to many machines

Far fewer edges cross partitions

Accessing a vertex’s neighbors requires accessing few partitions

In a social network, fewer connections are made and related user data can be pre-fetchedKey-Key-Value StoresOur proposed approach: extend the key-value modelData can be stored key-valuesUser profilesData can also be stored as key-key-valuesUser connections“Alice follows Bob”Use key-key-values to compute localityOn-line graph partitioning algorithmAssign keys to grid locations based on connectionsEach grid cell represents a data hostKeys that are related are kept together

OutlineIntroductionData in Social NetworksLeveraging LocalityKey-Key-Value StoresSystem ModelClient APIAdding a Key-Key-ValueLoad managementOn-line partitioning algorithmSimulation ParametersResultsConclusion

Address Table: Mapping Storea transactional, distributed hash table

maps keys to virtual machinesPhysical Layer: Physical machinescan be added or removed dynamically as demands changeLogical Layer: Virtual machinesOrganized as a square grid

Can be moved between physical machines as neededApplication Layer: Client APImaintain client sessions

cached dataApplication SessionsAddress tableVirtual hostsPhysical hosts

Client API and SessionsClients use a simple API that includes the get, put and sync commandsData is pulled from the logical layer in blocksGroups of related keysThe client API keeps data in an in-memory cacheData is pushed out asynchronously to virtual nodes in blocksPush/pull can be done synchronously if requested by the clientOffers stronger consistency at the cost of performance

Adding a key-key-valueput(alice, bob, follows)The on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connectedTwo users: Alice and BobWrite the data to that nodeWrite the same data to that nodeUse the Address Table to determine the virtual machine (node) that hosts Alice’s dataUse the address table to determine the node that hosts Bob’s dataAddress tablebob8,88,8alice1,1Virtual hostskv(bob, ...)...kkv(alice, bob, follows)kv(alice, ...)...kkv(alice, bob, follows)1,18,8

Once the split is complete, new physical machines can be turned onVirtual nodes can be transferred to these new machinesIf one node becomes overloaded, it can initiate a splitTo maintain the grid structure, nodes in the same row and column must also splitVirtual hostsSplitting a Node

OutlineIntroductionData in Social NetworksLeveraging LocalityKey-Key-Value StoresSystem ModelClient APIAdding a Key-Key-ValueLoad managementOn-line Partitioning AlgorithmSimulation ParametersResultsConclusion

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

More Related Content

What's hot

Viewers also liked

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

More from University of New South Wales

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Editor's Notes