SlideShare a Scribd company logo
1 of 28
Download to read offline
Hypertable
   Doug Judd
www.hypertable.org
Background
Zvents plan is to become the “Google” of
local search
Identified the need for a scalable DB
No solutions existed
Bigtable was the logical choice
Project started February 2007
Zvents Deployment
Traffic Reports
Change Log
Writing 1 Billion cells/day
Baidu Deployment
Log processing/viewing app injecting
approximately 500GB of data per day
120-node cluster running Hypertable and HDFS
  16GB RAM
  4x dual core Xeon
  8TB storage
Developed in-house fork with modifications for
scale
Working on a new crawl DB to store up to 1
petabyte of crawl data
Hypertable
What is it?
  Open source Bigtable clone
  Manages massive sparse tables with
  timestamped cell versions
  Single primary key index
What is it not?
  No joins
  No secondary indexes (not yet)
  No transactions (not yet)
                    hypertable.org
Scaling (part I)




      hypertable.org
Scaling (part II)




      hypertable.org
Scaling (part III)




       hypertable.org
System Overview
Table: Visual Representation




            hypertable.org
Table: Actual Representation




            hypertable.org
Anatomy of a Key




MVCC - snapshot isolation
Bigtable uses copy-on-write
Timestamp and revision shared by default
Simple byte-wise comparison
                      hypertable.org
Range Server
Manages ranges of table data
Caches updates in memory (CellCache)
Periodically spills (compacts) cached updates to disk
(CellStore)




                         hypertable.org
Range Server: CellStore
Sequence of 65K
blocks of compressed
key/value pairs
Compression
CellStore and CommitLog Blocks
Supported Compression Schemes
  zlib --best
  zlib --fast
  lzo
  quicklz
  bmz
  none



                    hypertable.org
Performance Optimizations
Block Cache
  Caches CellStore blocks
  Blocks are cached uncompressed
Bloom Filter
  Avoids unnecessary disk access
  Filter by rows or rows+columns
  Configurable false positive rate
Access Groups
  Physically store co-accessed columns together
  Improves performance by minimizing I/O

                      hypertable.org
Commit Log
One per RangeServer
Updates destined for many Ranges
  One commit log write
  One commit log sync
Log is directory
  100MB fragment files
  Append by creating a new fragment file
NO_LOG_SYNC option
Group commit (TBD)
Request Throttling
RangeServer tracks memory usage
Config properties
 Hypertable.RangeServer.MemoryLimit
 Hypertable.RangeServer.MemoryLimit.Percentage (70%)

Request queue is paused when memory
usage hits threshold
Heap fragmentation
 tcmalloc - good
 glibc - not so good
C++ vs. Java
Hypertable is CPU intensive
  Manages large in-memory key/value map
  Lots of key manipulation and comparisons
  Alternate compression codecs (e.g. BMZ)
Hypertable is memory intensive
  GC less efficient than explicitly managed memory
  Less memory means more merging compactions
  Inefficient memory usage = poor cache performance



                      hypertable.org
Language Bindings
Primary API is C++
Thrift Broker provides bindings for:
  Java
  Python
  PHP
  Ruby
  And more (Perl, Erlang, Haskell, C#, Cocoa, Smalltalk, and Ocaml)
Client API
class Client {

  void create_table(const String &name,
                    const String &schema);

  Table *open_table(const String &name);

  void alter_table(const String &name,
                   const String &schema);

  String get_schema(const String &name);

  void get_tables(vector<String> &tables);

  void drop_table(const String &name,
                  bool if_exists);
};
Client API (cont.)
class Table {
     TableMutator *create_mutator();
     TableScanner *create_scanner(ScanSpec &scan_spec);
};
class TableMutator {
     void set(KeySpec &key, const void *value, int value_len);
     void set_delete(KeySpec &key);
     void flush();
};
class TableScanner {
 bool next(CellT &cell);
};
                              hypertable.org
Client API (cont.)
class ScanSpecBuilder {
    void set_row_limit(int n);
    void set_max_versions(int n);
    void add_column(const String &name);
    void add_row(const String &row_key);
    void add_row_interval(const String &start, bool sinc,
                          const String &end, bool einc);
    void add_cell(const String &row, const String &column);
    void add_cell_interval(…)
    void set_time_interval(int64_t start, int64_t end);
    void clear();
    ScanSpec &get();
                                hypertable.org
}
Testing: Failure Inducer
Command line argument
--induce-failure=<label>:<type>:<iteration>


Class definition
class FailureInducer {
   public:
     void parse_option(String option);
     void maybe_fail(const String &label);
};


In the code
 if (failure_inducer)
    failure_inducer->maybe_fail("split-1");
1TB Load Test
1TB data
8 node cluster
  1 1.8 GHz dual-core Opteron
  4 GB RAM
  3 x 7200 RPM 250MB SATA drives
Key size = 10 bytes
Value size = 20KB (compressible text)
Replication factor: 3
4 simultaneous insert clients
~ 50 MB/s load (sustained)
~ 30 MB/s scan

                       hypertable.org
Performance Test
                (random read/write)
Single machine
   1 x 1.8 GHz dual-core Opteron
   4 GB RAM
Local Filesystem
250MB / 1KB values
Normal Table / lzo compression
  Batched writes                   31K inserts/s (31MB/s)
  Non-batched writes (serial)      500 inserts/s (500KB/s)
  Random reads (serial)            5800 queries/s (5.8MB/s)
Project Status
Current release is 0.9.2.4 “alpha”
Waiting for Hadoop 0.21 (fsync)
TODO for “beta”
  Namespaces
  Master directed RangeServer recovery
  Range balancing



                   hypertable.org
Questions?
www.hypertable.org




                hypertable.org

More Related Content

What's hot

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesInMobi Technology
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streamingRamūnas Urbonas
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Ontico
 
Go debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFxGo debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFxSignalFx
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsLaurynas Biveinis
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Cloudflare
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterAlexey Lesovsky
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layerKiyoto Tamura
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsLaurynas Biveinis
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Redis as a message queue
Redis as a message queueRedis as a message queue
Redis as a message queueBrandon Lamb
 

What's hot (20)

Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
 
Go debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFxGo debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFx
 
PgconfSV compression
PgconfSV compressionPgconfSV compression
PgconfSV compression
 
Page compression. PGCON_2016
Page compression. PGCON_2016Page compression. PGCON_2016
Page compression. PGCON_2016
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance Algorithms
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Go Memory
Go MemoryGo Memory
Go Memory
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
9.1 Grand Tour
9.1 Grand Tour9.1 Grand Tour
9.1 Grand Tour
 
Redis as a message queue
Redis as a message queueRedis as a message queue
Redis as a message queue
 

Similar to Hypertable Nosql

Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwordshypertable
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertablehypertable
 
Nosql frankfurt
Nosql frankfurtNosql frankfurt
Nosql frankfurthypertable
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Felix Geisendörfer
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevMattias Karlsson
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeChris Adkin
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdfMukundThakur22
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12Andrew Dunstan
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobKapil Goyal
 
Java user group 2015 02-09-java8
Java user group 2015 02-09-java8Java user group 2015 02-09-java8
Java user group 2015 02-09-java8marctritschler
 
Java user group 2015 02-09-java8
Java user group 2015 02-09-java8Java user group 2015 02-09-java8
Java user group 2015 02-09-java8Marc Tritschler
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesChris Adkin
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 

Similar to Hypertable Nosql (20)

Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwords
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertable
 
Nosql frankfurt
Nosql frankfurtNosql frankfurt
Nosql frankfurt
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from Oredev
 
SQL Saturday #438
SQL Saturday #438SQL Saturday #438
SQL Saturday #438
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Proving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slobProving out flash storage array performance using swingbench and slob
Proving out flash storage array performance using swingbench and slob
 
Java user group 2015 02-09-java8
Java user group 2015 02-09-java8Java user group 2015 02-09-java8
Java user group 2015 02-09-java8
 
Java user group 2015 02-09-java8
Java user group 2015 02-09-java8Java user group 2015 02-09-java8
Java user group 2015 02-09-java8
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 

More from elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

More from elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Hypertable Nosql

  • 1. Hypertable Doug Judd www.hypertable.org
  • 2. Background Zvents plan is to become the “Google” of local search Identified the need for a scalable DB No solutions existed Bigtable was the logical choice Project started February 2007
  • 3. Zvents Deployment Traffic Reports Change Log Writing 1 Billion cells/day
  • 4. Baidu Deployment Log processing/viewing app injecting approximately 500GB of data per day 120-node cluster running Hypertable and HDFS 16GB RAM 4x dual core Xeon 8TB storage Developed in-house fork with modifications for scale Working on a new crawl DB to store up to 1 petabyte of crawl data
  • 5. Hypertable What is it? Open source Bigtable clone Manages massive sparse tables with timestamped cell versions Single primary key index What is it not? No joins No secondary indexes (not yet) No transactions (not yet) hypertable.org
  • 6. Scaling (part I) hypertable.org
  • 7. Scaling (part II) hypertable.org
  • 8. Scaling (part III) hypertable.org
  • 12. Anatomy of a Key MVCC - snapshot isolation Bigtable uses copy-on-write Timestamp and revision shared by default Simple byte-wise comparison hypertable.org
  • 13. Range Server Manages ranges of table data Caches updates in memory (CellCache) Periodically spills (compacts) cached updates to disk (CellStore) hypertable.org
  • 14. Range Server: CellStore Sequence of 65K blocks of compressed key/value pairs
  • 15. Compression CellStore and CommitLog Blocks Supported Compression Schemes zlib --best zlib --fast lzo quicklz bmz none hypertable.org
  • 16. Performance Optimizations Block Cache Caches CellStore blocks Blocks are cached uncompressed Bloom Filter Avoids unnecessary disk access Filter by rows or rows+columns Configurable false positive rate Access Groups Physically store co-accessed columns together Improves performance by minimizing I/O hypertable.org
  • 17. Commit Log One per RangeServer Updates destined for many Ranges One commit log write One commit log sync Log is directory 100MB fragment files Append by creating a new fragment file NO_LOG_SYNC option Group commit (TBD)
  • 18. Request Throttling RangeServer tracks memory usage Config properties Hypertable.RangeServer.MemoryLimit Hypertable.RangeServer.MemoryLimit.Percentage (70%) Request queue is paused when memory usage hits threshold Heap fragmentation tcmalloc - good glibc - not so good
  • 19. C++ vs. Java Hypertable is CPU intensive Manages large in-memory key/value map Lots of key manipulation and comparisons Alternate compression codecs (e.g. BMZ) Hypertable is memory intensive GC less efficient than explicitly managed memory Less memory means more merging compactions Inefficient memory usage = poor cache performance hypertable.org
  • 20. Language Bindings Primary API is C++ Thrift Broker provides bindings for: Java Python PHP Ruby And more (Perl, Erlang, Haskell, C#, Cocoa, Smalltalk, and Ocaml)
  • 21. Client API class Client { void create_table(const String &name, const String &schema); Table *open_table(const String &name); void alter_table(const String &name, const String &schema); String get_schema(const String &name); void get_tables(vector<String> &tables); void drop_table(const String &name, bool if_exists); };
  • 22. Client API (cont.) class Table { TableMutator *create_mutator(); TableScanner *create_scanner(ScanSpec &scan_spec); }; class TableMutator { void set(KeySpec &key, const void *value, int value_len); void set_delete(KeySpec &key); void flush(); }; class TableScanner { bool next(CellT &cell); }; hypertable.org
  • 23. Client API (cont.) class ScanSpecBuilder { void set_row_limit(int n); void set_max_versions(int n); void add_column(const String &name); void add_row(const String &row_key); void add_row_interval(const String &start, bool sinc, const String &end, bool einc); void add_cell(const String &row, const String &column); void add_cell_interval(…) void set_time_interval(int64_t start, int64_t end); void clear(); ScanSpec &get(); hypertable.org }
  • 24. Testing: Failure Inducer Command line argument --induce-failure=<label>:<type>:<iteration> Class definition class FailureInducer { public: void parse_option(String option); void maybe_fail(const String &label); }; In the code if (failure_inducer) failure_inducer->maybe_fail("split-1");
  • 25. 1TB Load Test 1TB data 8 node cluster 1 1.8 GHz dual-core Opteron 4 GB RAM 3 x 7200 RPM 250MB SATA drives Key size = 10 bytes Value size = 20KB (compressible text) Replication factor: 3 4 simultaneous insert clients ~ 50 MB/s load (sustained) ~ 30 MB/s scan hypertable.org
  • 26. Performance Test (random read/write) Single machine 1 x 1.8 GHz dual-core Opteron 4 GB RAM Local Filesystem 250MB / 1KB values Normal Table / lzo compression Batched writes 31K inserts/s (31MB/s) Non-batched writes (serial) 500 inserts/s (500KB/s) Random reads (serial) 5800 queries/s (5.8MB/s)
  • 27. Project Status Current release is 0.9.2.4 “alpha” Waiting for Hadoop 0.21 (fsync) TODO for “beta” Namespaces Master directed RangeServer recovery Range balancing hypertable.org