• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hypertable Berlin Buzzwords
 

Hypertable Berlin Buzzwords

on

  • 5,432 views

This presentation was given by Doug Judd at BerlinBuzzwords 2010.

This presentation was given by Doug Judd at BerlinBuzzwords 2010.

Statistics

Views

Total Views
5,432
Views on SlideShare
4,617
Embed Views
815

Actions

Likes
2
Downloads
120
Comments
1

12 Embeds 815

http://blog.nosqlfan.com 676
http://log.medcl.net 78
http://www.slideshare.net 40
http://adiefatlady.posterous.com 7
http://devvideos.com 5
http://irr.posterous.com 3
http://www.uplook.cn 1
http://devslides.com 1
http://www.devslides.com 1
http://reader.youdao.com 1
http://webcache.googleusercontent.com 1
http://xue.uplook.cn 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • awesome
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • My name is Doug Judd and I'm the CEO of Hypertable, Inc. I'm also the original creator and current maintainer of the project. Today I'm going to present an architectural overview of Hypertable within the context of some of the other popular scalable database designs. I'm also going to present some preliminary results from a performance evaluation that we're in the process of doing, comparing Hypertable with HBase.
  • About three years ago, I was working as an architect at Zvents, a local search engine. At the time we were trying to become the "Google" of local search. This meant collecting large and growing amounts of click log and query log data, doing analytics on that data, and using the results to fuel our ranking, recommendation, and ad targetting systems. At the time Hadoop existed, which povided the scalable filesystem and the MapReduce framework, but there was no scalable open source solution for delivering large data sets to live applications. So we decided to do an implementation and Bigtable was the obvious choice. We decided to do it open source and reap all of the benefits that the open source development model has to offer. A big focus of the project has been on optimum performance and the reason is simple. Efficiency gains scale linearly with the system, which translates to reduced hardware to deliver the same capacity. To that end, we chose C++ as the implementation language. Though the system is implemented in C++, we have a Thrift interface that provides language bindings for all popular high-level languages.
  • Describe the 360 degree panoramic view feature of Google Maps
  • In ‘07, Google was running 100,000 MapReduce jobs and processing 20 petabytes daily
  • Describe the 360 degree panoramic view feature of Google Maps
  • Describe the 360 degree panoramic view feature of Google Maps

Hypertable Berlin Buzzwords Hypertable Berlin Buzzwords Presentation Transcript

  • Hypertable Doug Judd CEO, Hypertable, Inc.
  • High Performance, Open Source Scalable Database
    • Modeled after Bigtable
    • High Performance Implementation (C++)
    • Project Started in March 2007
    • Thrift Interface for all popular languages
      • Java
      • PHP
      • Ruby
      • Python
      • Perl, etc.
  • Bigtable: the infrastructure that Google is built on
    • YouTube
    • Blogger
    • Google Earth
    • Google Maps
    • Orkut (social network)
    • Gmail
    • Google Analytics
    • Google Book Search
    • Google Code
    • Crawl Database
    • … plus 90 other Google services …
  • Functionality
    • Massive sparse tables of information
    • Single primary key index
    • Cells can have mulitple timestamped versions
    • Not Relational
      • No joins (not yet)
      • No secondary indexes (not yet)
      • Not a transaction system (not yet)
  • Hypertable Deployments
  • Other Architectures
  • Auto-Sharding MongoDB AsterData Greenplum
  • MongoDB
  • Dynamo-based Hash Table Architectures Cassandra Project Voldemort Riak
  • Eventual Consistency
  • Consistent Hashing
  • Order Preserving Partitioner (Cassandra) www.recipezaar.com 1091721999…629750272 + www.ribbonprinters.com 1091721999…965293103 / 2 = www.rgb????i?pQdp ?.??? 1091721999…297521687
  • Order Preserving Partitioner Balance Problem
  • Hypertable Architecture
  • Conceptual Table Layout
  • Table: Actual Representation
  • Range Distribution
  • Google Stack
  • Google File System
  • Google File System
  • System Overview
  • Log Structured Merge (LSM) Tree
    • Eliminates random I/O on writes
    • Converts random I/O to sequential I/O
    • Write path
      • Commit log on disk (DFS)
      • In-memory map
    • In-memory map gets “compacted” to disk
    • Disk files periodically get merged
  • Range Server
    • Manages ranges of table data
    • CellCache: In-memory map containing recent updates
    • CellStore: On-disk (DFS) file containing “compacted” cell cache
  • Range Server: CellStore
    • Sequence of 65K blocks of compressed key/value pairs
  • Compression
    • Cell Store blocks are compressed
    • Commit Log updates are compressed
    • Supported Compression Schemes
      • zlib (--best and --fast)
      • lzo
      • quicklz
      • bmz
      • none
  • Bloom Filter
    • Probabilistic data structure associated with every CellStore
    • Indicates if key is not present
  • Caching
    • Block Cache
      • Caches CellStore blocks
      • Blocks are cached uncompressed
      • Dynamically adjusted size based on workload
    • Query Cache
      • Caches query results
  • Dynamic Memory Adjustment
  • Performance Evaluation Hypertable vs. HBase
  • Test Setup
    • Hypertable v0.9.3.2 (not yet released)
    • HBase 0.20.3
    • HDFS 0.20.2
    • 10 machines
      • 3 Hyperspace / Zookeeper replicas
      • 1 Master / 4 Tablet Servers (5GB RAM)
      • 1 Test Dispatcher / 4 Test Clients
    • Machine profile
      • 1 X 1.8 GHz Dual-core Opteron
      • 10 GB RAM
      • 3 X 250 GB SATA drives
  • Random Write / Sequential Read
  • Random Read
  • Project Resources
    • Twitter: hypertable
    • www.hypertable.org
  • Professional Support