HyperDex: A Closer Look at a Distributed, Searchable Key-Value Store

–  DECK36 is a young spin-off from
ICANS
–  Small team of 7 engineers
–  Longstanding expertise in
designing, implementing and
operating complex web systems
–  Developing own data intelligencefocused tools and web services
–  Offering our expert knowledge in:
–  Automation & Operations
–  Architecture & Engineering
–  Analytics & Data Logistics
Dr. Stefan Schadwinkel
Co-Founder / Analytics Engineer
stefan.schadwinkel@deck36.de

BACKGROUND
*log: Storm-based Analytics RT

BACKGROUND
*log
Our *log provides stream-based real-time analytics. We need a serious DB.
We need to focus on servicing each request, scale easily & fast, throughput must be
consistent, we need secondary indices, and the possibility to compute aggregations.

MongoDB, Cassandra, Riak, MariaDB

HyperDex: A Distributed, Searchable Key-Value Store.
Robert Escriva, Bernard Wong and Emin Gün Sirer.
In Proceedings of the SIGCOMM Conference, Helsinki, Finland, August 2012.
http://hyperdex.org/papers/hyperdex.pdf

WHY HYPERDEX?
Next Generation K/V

WHY HYPERDEX?
Features.
CAP - Common Buzz: Consistent, Available, Partition-tolerant – Pick any two.
From http://hyperdex.org/FAQ/: HyperDex is designed to withstand a threshold of
failures desired by the application. The level of fault-tolerance is tunable by the system
administrator. HyperDex guarantees consistency, availability in the presence of less
than f faults, and partition tolerance for partitions that affect less than f nodes, where f
is a user-tunable parameter.

-  Fully linearizable. Every ‘get’ always returns the latest ‘put’.
-  Tolerates up to f failures.
-  Query secondary attributes almost as fast as the primary key.
-  Rich data types: Strings, Floats, Ints, Lists, Maps, Sets
-  Atomic, multi-key transactions. (Commercial)

HYPERSPACE HASHING
Mapping Data into Euclidean Space
Each object is mapped into space. Space is mapped onto servers.
One hyperspace relates to one table. HyperDex can manage multiple independent
hyperspaces.

HYPERSPACE HASHING
So far, so good. Aww, wait!
The curse of dimensionality.
The volume of the resulting hyperspace grows
exponentially in the number of dimensions/
attributes.

For instance, a table with 9 dimensions requires 29
regions. That’s a minimum of 512 servers.

HYPERSPACE HASHING
Logarithms to the rescue!
Subspaces.
HyperDex splits the hyperspace into multiple lower dimensional subspaces. Thus, the
volume of the space only grows linearly. Not only does this reduce the number of
machines required to store the data, search becomes more efﬁcient, because less
machines need to be contacted. A key subspace is added to distinguish key lookup
from single attribute searches. Each subspace stores a full copy of the object.

VALUE-DEPENDENT CHAINING
Consistency and Replication.
We have copies of each object in each subspace.
Value-dependent chaining keeps all copies consistent and provides strong consistency
(linearizability) and fault tolerance in the presence of concurrent updates.

Consistency.
HyperDex propagates each update deterministically to all relevant spaces.
Update u1: PUT (insert key)
-  h1, h2, h3

Chains are executed from the
end.

Head = Point leader. The same
for each key.

The point leader knows all
updates. Dependencies are
embedded in the chain.

Replication.
HyperDex inserts replicas for each region into the chain.
Consider Update u1:
-  h1, h2, h3
-  h1, h1‘, h2, h2‘, h3, h3‘
-  h1, h1‘, h2‘, h2‘‘, h3, h3‘
Replicas are always updated
ﬁrst.

Failures do not compromise
strong consistency.

Clients are only acknowledged
after full replication is achieved.

THE PARTS OF THE MACHINE
HyperDex - Nuts and Bolts

The Slave Node.
Everything is C++.
The slave nodes are not particularly interesting.

The Coordinator & the Configuration.
A logically centralized coordinator maintains global state.
-  Own replicated state machine for the coordinator called “replicant”. This is what
Zookeeper does for Hadoop et al.
-  Global state is maintained as Configuration.
-  The coordinator has no state of the stored objects, only mappings and servers.
-  Instance: IP, Port, Instance ID.
-  The coordinator creates new configurations based on changes and failures and
distributes it to the client.

The Client.
The client is part of the whole system, not just a customer.
-  Client receives new configurations from the coordinator.
-  Switching to a new configuration is atomic.
-  Client only contacts relevant nodes. This is significant for performance.
-  Clients must be “intelligent”. No REST.
-  A load-balancing proxy layer could help. But isn’t there.
-  Full support for C++, Python.
-  Partial support for Java (uses the C++ driver through JNI), Node.JS, Ruby
-  Using layers skips features. Java driver doesn’t support “count”.

THE REAL WORLD™
HyperDex Tutorial

THE REAL WORLD ™
Install.
Pre-build packages.
Supports CentOS, Debian, Fedora, Ubuntu.
But not all versions.
And not everything. Read: “No package for the Java driver”.

Build from source.
Good luck.

Be super conscious of package versions.
More on that in a minute.

THE REAL WORLD ™
Start the Daemons.
Coordinator.
# hyperdex coordinator -f -l 127.0.0.1 -p 1982

Data Nodes.
# hyperdex daemon -f --listen=127.0.0.1 --listen-port=2022

--coordinator=127.0.0.1 --coordinator-port=1982

--data=./data0/

# hyperdex daemon -f --listen=127.0.0.1 --listen-port=2032

--coordinator=127.0.0.1 --coordinator-port=1982

--data=./data1/

THE REAL WORLD ™
Client Demo
The Python client is the HyperDex shell. Create Hyperspace.
# python

THE REAL WORLD ™
Client Demo
Create a client. Basic PUT/GET. Uses Key subspace.
# python

THE REAL WORLD ™
Client Demo
Search. Uses further subspaces.

THE REAL WORLD ™
Client Demo
Updates and Range Query/Search.

THE REAL WORLD ™
Bashing the Prophetess & the Giant.
Performance Benchmarks use the YCSB against Cassandra and MongoDB.
Dedicated cluster of 14 Nodes in the VICCI cloud.
Take it with a grain of salt.
I’m missing Riak.

THE REAL WORLD ™
Throughput.

THE REAL WORLD™
Experiences & Findings

THE REAL WORLD ™
Experiences. Findings.
Minor versions are incompatible.
-  hyperdex-1.0.rc4 vs. hyperdex-1.0.rc5
-  import hyperclient vs. hyperdex.admin, hyperdex.client
-  (hyperdisk) vs. leveldb vs. hyperleveldb

-  There goes my dream of using the PHP driver on github.
-  Migration? No idea.
-  Compile? Use VM to go.

THE REAL WORLD ™
Experiences. Findings.
It’s just a K/V store.
-  No methods to do distributed computations. Python map/reduce is on the agenda.
No Dynamo Ring. But a chain to rule them all.
-  Fault-tolerance with f dedicated nodes is ﬁne, but what about multiple datacenters?
It’s a quite young project with few committers.
Important internals change between minor versions.
Not much sleep for them. How about your DevOps?

REMEMBER?
*log: Storm-based Analytics RT

WHAT ABOUT?
*log
We chose Riak.
-  Excellent Java driver.
-  We don’t need transactions.
-  During development, our schema will change often.
-  Operational ease, easy to scale, excellent feedback.
-  Map/reduce in Erlang and JS. Can use the result of a secondary index query.
-  Solr Integration with Riak Search. Not at the moment, but we deal with content.
We like HyperDex.
-  Really interesting concepts and advancements, but atm not the perfect ﬁt.
-  Implemented a storage backend abstraction layer. Easy to switch to HyperDex once
its more mature.

HyperDex: A Closer Look at a Distributed, Searchable Key-Value Store

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HyperDex: A Closer Look at a Distributed, Searchable Key-Value Store

Similar to HyperDex: A Closer Look at a Distributed, Searchable Key-Value Store (20)

More from DECK36

More from DECK36 (7)

Recently uploaded

Recently uploaded (20)

HyperDex: A Closer Look at a Distributed, Searchable Key-Value Store