Hypertable Doug Judd CEO, Hypertable, Inc.
High Performance, Open Source Scalable Database Modeled after  Bigtable High Performance Implementation (C++) Project Started in March 2007 Runs on top of HDFS Thrift Interface for all popular languages Java PHP Ruby Python Perl, etc.
Hypertable Deployments
Architecture
Underlying Data Representation
Scaling (part I)
Scaling (part II)
Scaling (part III)
Request Routing
Query Handling
Features
Load data from HT to Hive and vice-versa Use Hive types  Use Hive QL (joins, aggregations) Low latency data warehousing Uses Hypertable’s native MapReduce Input/Output format
Namespaces /development user tweet /testing user tweet /production /v1 user tweet /v2 user tweet
Column Family Options TTL=<t> “ time to live” Remove cells that are older than <t> MAX_VERSIONS=<n> Keep only most recent <n> cell versions
Access Groups Provides control over physical layout Row oriented Column oriented Hybrid Reduces I/O CREATE TABLE MyTable ( a, b, c, d, ACCESS GROUP first(a), ACCESS GROUP second (b, c, d) );
Regular Expression Filtering Google’s RE2 regular expression engine Extremely fast (up to 50X Java regex) Searches run in time linear in the size of the input Searches constrained to a fixed amount of memory Supported Searches: Row key Column qualifier Value SELECT CELLS tag:/(?i)(nosql|bigtable)/ FROM MyTable WHERE ROW REGEXP &quot;^\D+&quot; AND  VALUE REGEXP ”(?i)hypertable&quot;;
Atomic Counters New column option: Modified via existing API using specially formatted values: create table counts ( url COUNTER, ); Reset counter to n =n Decrement counter by n -n Increment counter by n [+]n Description Value Format
Group Commit Supports  highly concurrent  updates Trades minimum latency for better throughput Configurable commit interval per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;
Compression Block compression Cell Store (SSTable) blocks Commit Log blocks Supported Compression Schemes: zlib lzo quicklz bmz none
Bloom Filter Dramatically reduces disk access Associated with each Cell Store Tells you if key is definitively  not  present
Performance Evaluation
Setup Modeled after Test described in Bigtable paper 1 Test Dispatcher, 4 Test Clients, 4 Tablet Servers Test was written entirely in  Java Hardware 1 X 1.8 GHz Dual-core Opteron 10 GB RAM 3X 250GB SATA drives Software HDFS 0.20.2 running on all 10 nodes, 3X replication HBase 0.20.4 Hypertable 0.9.3.3
Latency
Throughput 220 Scan 10 byte values 75 Scan 100 byte values 58 Scan 1KB values 2 Scan 10KB values 129 Sequential Read 100 byte  values 68 Sequential Read 1KB values 1060 Sequential Read 10KB values 931 Random Write 10 byte values 427 Random Write 100 byte values 102 Random Write 1KB values 51 Random Write 10KB values 100 Random Read Zipfian 2.5 GB 777 Random Read Zipfian 20 GB 925 Random Read Zipfian 80 GB Hypertable Advantage Relative to HBase (%) Test
Why does Performance Matter? $$$
Upcoming Release (0.9.5) Last “alpha” release Release Date:  February 15th 2011 Features Automatic range balancing Asynchronous API Improved Monitoring System
Resources   blog .hypertable.com Blog:   www.hypertable.org Project Site:   hypertable Twitter:
Professional Support
Q&A

Nosql series-part-3-hypertable

  • 1.
    Hypertable Doug JuddCEO, Hypertable, Inc.
  • 2.
    High Performance, OpenSource Scalable Database Modeled after Bigtable High Performance Implementation (C++) Project Started in March 2007 Runs on top of HDFS Thrift Interface for all popular languages Java PHP Ruby Python Perl, etc.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Load data fromHT to Hive and vice-versa Use Hive types Use Hive QL (joins, aggregations) Low latency data warehousing Uses Hypertable’s native MapReduce Input/Output format
  • 13.
    Namespaces /development usertweet /testing user tweet /production /v1 user tweet /v2 user tweet
  • 14.
    Column Family OptionsTTL=<t> “ time to live” Remove cells that are older than <t> MAX_VERSIONS=<n> Keep only most recent <n> cell versions
  • 15.
    Access Groups Providescontrol over physical layout Row oriented Column oriented Hybrid Reduces I/O CREATE TABLE MyTable ( a, b, c, d, ACCESS GROUP first(a), ACCESS GROUP second (b, c, d) );
  • 16.
    Regular Expression FilteringGoogle’s RE2 regular expression engine Extremely fast (up to 50X Java regex) Searches run in time linear in the size of the input Searches constrained to a fixed amount of memory Supported Searches: Row key Column qualifier Value SELECT CELLS tag:/(?i)(nosql|bigtable)/ FROM MyTable WHERE ROW REGEXP &quot;^\D+&quot; AND VALUE REGEXP ”(?i)hypertable&quot;;
  • 17.
    Atomic Counters Newcolumn option: Modified via existing API using specially formatted values: create table counts ( url COUNTER, ); Reset counter to n =n Decrement counter by n -n Increment counter by n [+]n Description Value Format
  • 18.
    Group Commit Supports highly concurrent updates Trades minimum latency for better throughput Configurable commit interval per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;
  • 19.
    Compression Block compressionCell Store (SSTable) blocks Commit Log blocks Supported Compression Schemes: zlib lzo quicklz bmz none
  • 20.
    Bloom Filter Dramaticallyreduces disk access Associated with each Cell Store Tells you if key is definitively not present
  • 21.
  • 22.
    Setup Modeled afterTest described in Bigtable paper 1 Test Dispatcher, 4 Test Clients, 4 Tablet Servers Test was written entirely in Java Hardware 1 X 1.8 GHz Dual-core Opteron 10 GB RAM 3X 250GB SATA drives Software HDFS 0.20.2 running on all 10 nodes, 3X replication HBase 0.20.4 Hypertable 0.9.3.3
  • 23.
  • 24.
    Throughput 220 Scan10 byte values 75 Scan 100 byte values 58 Scan 1KB values 2 Scan 10KB values 129 Sequential Read 100 byte values 68 Sequential Read 1KB values 1060 Sequential Read 10KB values 931 Random Write 10 byte values 427 Random Write 100 byte values 102 Random Write 1KB values 51 Random Write 10KB values 100 Random Read Zipfian 2.5 GB 777 Random Read Zipfian 20 GB 925 Random Read Zipfian 80 GB Hypertable Advantage Relative to HBase (%) Test
  • 25.
  • 26.
    Upcoming Release (0.9.5)Last “alpha” release Release Date: February 15th 2011 Features Automatic range balancing Asynchronous API Improved Monitoring System
  • 27.
    Resources blog .hypertable.com Blog: www.hypertable.org Project Site: hypertable Twitter:
  • 28.
  • 29.

Editor's Notes

  • #13 -Why use the HT Hive extension? -select from and insert to Hypertable from Hive -currently HT doesn’t support types natively everything is a string, whereas Hive has an extensive type system -Hive allows you to do joins across multiple tables, some in HT, some in Hive or other sources -Keep fast changing data in HT and combine with other less frequently updated data sources -This is an initial working implementation , we have a fairly detailed roadmap of features we want to add going forward.