Hypertable Doug Judd CEO, Hypertable, Inc.
High Performance, Open Source Scalable Database Modeled after  Bigtable High Performance Implementation (C++) Project Started in March 2007 Thrift Interface for all popular languages Java PHP Ruby Python Perl, etc.
Bigtable: the infrastructure that Google is built on YouTube Blogger Google Earth Google Maps Orkut (social network) Gmail Google Analytics Google Book Search Google Code Crawl Database …  plus 90 other Google services …
Functionality Massive sparse tables of information Single primary key index Cells can have mulitple timestamped versions Not Relational No joins (not yet) No secondary indexes (not yet) Not a transaction system (not yet)
Hypertable Deployments
Other Architectures
Auto-Sharding MongoDB AsterData Greenplum
MongoDB
Dynamo-based Hash Table Architectures Cassandra Project Voldemort Riak
Eventual Consistency
Consistent Hashing
Order Preserving Partitioner (Cassandra) www.recipezaar.com   1091721999…629750272 + www.ribbonprinters.com   1091721999…965293103 / 2 = www.rgb????i?pQdp ?.???  1091721999…297521687
Order Preserving Partitioner Balance Problem
Hypertable Architecture
Conceptual Table Layout
Table: Actual Representation
Range Distribution
Google Stack
Google File System
Google File System
System Overview
Log Structured Merge (LSM) Tree Eliminates random I/O on writes Converts random I/O to sequential I/O Write path Commit log on disk (DFS) In-memory map In-memory map gets “compacted” to disk Disk files periodically get merged
Range Server Manages ranges of table data CellCache:  In-memory map containing recent updates CellStore:  On-disk (DFS) file containing “compacted” cell cache
Range Server: CellStore Sequence of 65K blocks of compressed key/value pairs
Compression Cell Store blocks are compressed Commit Log updates are compressed Supported Compression Schemes zlib (--best and --fast) lzo quicklz bmz none
Bloom Filter Probabilistic data structure associated with every CellStore Indicates if key is  not  present
Caching Block Cache Caches CellStore blocks Blocks are cached uncompressed Dynamically adjusted size based on workload Query Cache Caches query results
Dynamic Memory Adjustment
Performance Evaluation Hypertable vs. HBase
Test Setup Hypertable v0.9.3.2 (not yet released) HBase 0.20.3 HDFS 0.20.2 10 machines 3 Hyperspace / Zookeeper replicas 1 Master / 4 Tablet Servers (5GB RAM) 1 Test Dispatcher / 4 Test Clients Machine profile 1 X 1.8 GHz Dual-core Opteron 10 GB RAM 3 X 250 GB SATA drives
Random Write / Sequential Read
Random Read
Project Resources Twitter:  hypertable www.hypertable.org
Professional Support

Hypertable Berlin Buzzwords

Editor's Notes

  • #2 My name is Doug Judd and I'm the CEO of Hypertable, Inc. I'm also the original creator and current maintainer of the project. Today I'm going to present an architectural overview of Hypertable within the context of some of the other popular scalable database designs. I'm also going to present some preliminary results from a performance evaluation that we're in the process of doing, comparing Hypertable with HBase.
  • #3 About three years ago, I was working as an architect at Zvents, a local search engine. At the time we were trying to become the "Google" of local search. This meant collecting large and growing amounts of click log and query log data, doing analytics on that data, and using the results to fuel our ranking, recommendation, and ad targetting systems. At the time Hadoop existed, which povided the scalable filesystem and the MapReduce framework, but there was no scalable open source solution for delivering large data sets to live applications. So we decided to do an implementation and Bigtable was the obvious choice. We decided to do it open source and reap all of the benefits that the open source development model has to offer. A big focus of the project has been on optimum performance and the reason is simple. Efficiency gains scale linearly with the system, which translates to reduced hardware to deliver the same capacity. To that end, we chose C++ as the implementation language. Though the system is implemented in C++, we have a Thrift interface that provides language bindings for all popular high-level languages.
  • #4 Describe the 360 degree panoramic view feature of Google Maps
  • #16 In ‘07, Google was running 100,000 MapReduce jobs and processing 20 petabytes daily
  • #18 Describe the 360 degree panoramic view feature of Google Maps
  • #22 Describe the 360 degree panoramic view feature of Google Maps