Published on

credit to Doug Judd

Published in: Technology, Education
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Hypertable Doug Juddwww.hypertable.org
  2. 2. BackgroundZvents plan is to become the “Google” oflocal searchIdentified the need for a scalable DBNo solutions existedBigtable was the logical choiceProject started February 2007
  3. 3. Zvents DeploymentTraffic ReportsChange LogWriting 1 Billion cells/day
  4. 4. Baidu DeploymentLog processing/viewing app injectingapproximately 500GB of data per day120-node cluster running Hypertable and HDFS 16GB RAM 4x dual core Xeon 8TB storageDeveloped in-house fork with modifications forscaleWorking on a new crawl DB to store up to 1petabyte of crawl data
  5. 5. HypertableWhat is it? Open source Bigtable clone Manages massive sparse tables with timestamped cell versions Single primary key indexWhat is it not? No joins No secondary indexes (not yet) No transactions (not yet) hypertable.org
  6. 6. Scaling (part I) hypertable.org
  7. 7. Scaling (part II) hypertable.org
  8. 8. Scaling (part III) hypertable.org
  9. 9. System Overview
  10. 10. Table: Visual Representation hypertable.org
  11. 11. Table: Actual Representation hypertable.org
  12. 12. Anatomy of a KeyMVCC - snapshot isolationBigtable uses copy-on-writeTimestamp and revision shared by defaultSimple byte-wise comparison hypertable.org
  13. 13. Range ServerManages ranges of table dataCaches updates in memory (CellCache)Periodically spills (compacts) cached updates to disk(CellStore) hypertable.org
  14. 14. Range Server: CellStoreSequence of 65Kblocks of compressedkey/value pairs
  15. 15. CompressionCellStore and CommitLog BlocksSupported Compression Schemes zlib --best zlib --fast lzo quicklz bmz none hypertable.org
  16. 16. Performance OptimizationsBlock Cache Caches CellStore blocks Blocks are cached uncompressedBloom Filter Avoids unnecessary disk access Filter by rows or rows+columns Configurable false positive rateAccess Groups Physically store co-accessed columns together Improves performance by minimizing I/O hypertable.org
  17. 17. Commit LogOne per RangeServerUpdates destined for many Ranges One commit log write One commit log syncLog is directory 100MB fragment files Append by creating a new fragment fileNO_LOG_SYNC optionGroup commit (TBD)
  18. 18. Request ThrottlingRangeServer tracks memory usageConfig properties Hypertable.RangeServer.MemoryLimit Hypertable.RangeServer.MemoryLimit.Percentage (70%)Request queue is paused when memoryusage hits thresholdHeap fragmentation tcmalloc - good glibc - not so good
  19. 19. C++ vs. JavaHypertable is CPU intensive Manages large in-memory key/value map Lots of key manipulation and comparisons Alternate compression codecs (e.g. BMZ)Hypertable is memory intensive GC less efficient than explicitly managed memory Less memory means more merging compactions Inefficient memory usage = poor cache performance hypertable.org
  20. 20. Language BindingsPrimary API is C++Thrift Broker provides bindings for: Java Python PHP Ruby And more (Perl, Erlang, Haskell, C#, Cocoa, Smalltalk, and Ocaml)
  21. 21. Client APIclass Client { void create_table(const String &name, const String &schema); Table *open_table(const String &name); void alter_table(const String &name, const String &schema); String get_schema(const String &name); void get_tables(vector<String> &tables); void drop_table(const String &name, bool if_exists);};
  22. 22. Client API (cont.)class Table { TableMutator *create_mutator(); TableScanner *create_scanner(ScanSpec &scan_spec);};class TableMutator { void set(KeySpec &key, const void *value, int value_len); void set_delete(KeySpec &key); void flush();};class TableScanner { bool next(CellT &cell);}; hypertable.org
  23. 23. Client API (cont.)class ScanSpecBuilder { void set_row_limit(int n); void set_max_versions(int n); void add_column(const String &name); void add_row(const String &row_key); void add_row_interval(const String &start, bool sinc, const String &end, bool einc); void add_cell(const String &row, const String &column); void add_cell_interval(…) void set_time_interval(int64_t start, int64_t end); void clear(); ScanSpec &get(); hypertable.org}
  24. 24. Testing: Failure InducerCommand line argument--induce-failure=<label>:<type>:<iteration>Class definitionclass FailureInducer { public: void parse_option(String option); void maybe_fail(const String &label);};In the code if (failure_inducer) failure_inducer->maybe_fail("split-1");
  25. 25. 1TB Load Test1TB data8 node cluster 1 1.8 GHz dual-core Opteron 4 GB RAM 3 x 7200 RPM 250MB SATA drivesKey size = 10 bytesValue size = 20KB (compressible text)Replication factor: 34 simultaneous insert clients~ 50 MB/s load (sustained)~ 30 MB/s scan hypertable.org
  26. 26. Performance Test (random read/write)Single machine 1 x 1.8 GHz dual-core Opteron 4 GB RAMLocal Filesystem250MB / 1KB valuesNormal Table / lzo compression Batched writes 31K inserts/s (31MB/s) Non-batched writes (serial) 500 inserts/s (500KB/s) Random reads (serial) 5800 queries/s (5.8MB/s)
  27. 27. Project StatusCurrent release is “alpha”Waiting for Hadoop 0.21 (fsync)TODO for “beta” Namespaces Master directed RangeServer recovery Range balancing hypertable.org
  28. 28. Questions?www.hypertable.org hypertable.org