Hw09   Practical HBase  Getting The Most From Your H Base Install

Hw09 Practical HBase Getting The Most From Your H Base Install






Total Views
Views on SlideShare
Embed Views



5 Embeds 95

http://www.slideshare.net 58
http://notes.alexdong.com 19
http://blog.daum.net 15
http://editor.daum.net 2
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Hw09   Practical HBase  Getting The Most From Your H Base Install Hw09 Practical HBase Getting The Most From Your H Base Install Presentation Transcript

  • HBase, Hadoop World NYC
    • Ryan Rawson, Stumbleupon.com, su.pr
    • Jonathan Gray, Streamy.com
  • A presentation in 2 parts
  • Part 1 View slide
  • About Me
    • Ryan Rawson
    • Senior Software Developer @ Stumbleupon
    • HBase committer, core contributor
    View slide
  • Stumbleupon
    • Uses HBase in production
    • Behind features of our su.pr service
    • More later
  • Adventures with MySQL
    • Scaling MySQL hard, Oracle expensive (and hard)
    • Machine cost goes up faster speed
    • Turn off all relational features to scale
    • Turn off secondary (!) indexes too! (!!)
  • MySQL problems cont.
    • Tables can be a problem at sizes as low as 500GB
    • Hard to read data quickly at these sizes
    • Future doesn’t look so bright as we contemplate 10x sizes
    • MySQL master becomes a problem...
  • Limitations of masters
    • What if your write speed is greater than a single machine?
    • All slaves must have same write capacity as master (can’t cheap out on slaves)
    • Single point of failure, no easy failover
    • Can (sort of) solve this with sharding...
  • Sharding
  • Sharding problems
    • Requires either a hashing function or mapping table to determine shard
    • Data access code becomes complex
    • What if shard sizes become too large...
  • Resharding!
  • What about schema changes?
    • What about schema changes or migrations?
    • MySQL not your friend here
    • Only gets harder with more data
  • HBase to the rescue
    • Clustered, commodity(ish) hardware
    • Mostly schema-less
    • Dynamic distribution
    • Spreads writes out over the cluster
  • What is HBase?
    • HBase is an open-source distributed database, inspired by Google’s bigtable
    • Part of the Hadoop ecosystem
    • Layers on HDFS for storage
    • Native connections to map reduce
  • HBase storage model
    • Column-oriented database
    • Column name is arbitrary data, can have large, variable, number of columns per row
    • Rows stored in sorted order
    • Can random read and write
  • Tables
    • Table is split into roughly equal sized “regions”
    • Each region is a contiguous range of keys, from [start, to end)
    • Regions split as they grow, thus dynamically adjusting to your data set
  • Server architecture
    • Similar to HDFS:
      • Master = Namenode (ish)
      • Regionserver = Datanode (ish)
    • Often run these alongside each other!
  • Server Architecture 2
    • But not quite the same, HBase stores state in HDFS
    • HDFS provides robust data storage across machines, insulating against failure
    • Master and Regionserver fairly stateless and machine independent
  • Region assignment
    • Each region from every table is assigned to a Regionserver
    • The master is responsible for assignment and noticing if (when!) regionservers go down
  • Master Duties
    • When machines fail, move regions from affected machines to others
    • When regions split, move regions to balance cluster
    • Could move regions to respond to load
    • Can run multiple backup masters
  • What Master does NOT do
    • Does not handle any write requests (not a DB master!)
    • Does not handle location finding requests
    • Not involved in the read/write path!
    • Generally does very little most the time
  • Distributed coordination
    • To manage master election and server availability we use ZooKeeper
    • Set up as a cluster, provides distributed coordination primitives
    • An excellent tool for building cluster management systems
  • Scaling HBase
    • Add more machines to scale
    • Base model (bigtable) scales past 1000TB
    • No inherent reason why HBase couldn’t
  • What to store in HBase?
    • Maybe not your raw log data...
    • ... but the results of processing it with Hadoop!
    • By storing the refined version in HBase, can keep up with huge data demands and serve to your website
    • Provides a real time, structured storage layer that integrates on your existing Hadoop clusters
    • Provides “out of the box” hookups to map-reduce.
    • Uses the same loved (or hated) management model as Hadoop
    HBase & Hadoop
  • HBase @
  • Stumbleupon & HBase
    • Started investigating the field in Jan ’09
    • Looked at 3 top (at the time) choices:
      • Cassandra
      • Hypertable
      • HBase
    cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community
  • Stumbleupon & HBase
    • Picked HBase:
      • Community
      • Features
      • Map-reduce, cascading, etc
    • Now highly involved and invested
  • su.pr marketing
    • “ Su.pr is the only URL shortener that also helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!”
  • su.pr tech features
    • Real time stats
      • Done directly in HBase
    • In depth stats
      • Use cascading, map reduce and put results in hbase
  • su.pr web access
    • Using thrift gateway, php code accesses HBase
    • No additional caching other than what HBase provides
  • Large data storage
    • Over 9 billion rows and 1300 GB in HBase
    • Can map reduce a 700GB table in ~ 20 min
    • That is about 6 million rows/sec
    • Scales to 2x that speed on 2x the hardware
  • Micro read benches
    • Single reads are 1-10ms depending on disk seeks and caching
    • Scans can return hundreds of rows in dozens of ms
  • Serial read speeds
    • A small table
    • A bigger table
    • (removed printlns from the code)
  • Deployment considerations
    • Zookeeper requires IO to complete ops
    • Consider hosting on dedicated machines
    • Namenode and HBase master can co-exist
  • What to put on your nodes
    • Regionserver requires 2-4 cores and 3gb+
    • Can’t run HDFS, HBase, maps, reduces on a 2 core system
    • On my 8 core systems I run datanode, regionserver, 2 maps, 2 reduces
  • Garbage collection
    • GC tuning becomes important.
    • Quick tip: use CMS, use -Xmx4000m
    • Interested in G1 (if it ever stops crashing)
  • Batch and interactive
    • These may not be compatible
    • Latency goes up with heavy batch load
    • May need to use 2 clusters to ensure responsive website
  • Part 2
  • HBase @ Streamy
    • History of Data
    • RDBMS Issues
    • HBase to the Rescue
    • Streamy Today and Tomorrow
    • Future of HBase
  • About Me
    • Co-Founder and CTO of Streamy.com
    • HBase Committer
    • Migrated Streamy from RDBMS to HBase and Hadoop in June 2008
  • History of Data The Prototype
    • Streamy 1.0 built on PostgreSQL
        • All of the bells and whistles
    • Powered by single low-spec node
        • 8 core / 8 GB / 2TB / $4k
    Functionally powerful, Woefully slow
  • History of Data The Alpha
    • Streamy 1.5 built on optimized PostgreSQL
        • Remove bells and whistles, add partitioning
    • Powered by high-powered master node
        • 16 core / 64 GB / 15x146GB 15k RPM / $40k
    Less powerful, still slow ... Insanely expensive
  • History of Data The Beta
    • Streamy 2.0 built entirely on HBase
        • Custom caches, query engines, and API
    • Powered by 10 low-spec nodes
        • 4 core / 4GB / 1TB / $10k for entire cluster
    Less functional but fast , scalable , and cheap
  • RDBMS Issues
    • Poor disk usage patterns
    • Black box query engine
    • Write speed degrades with table size
    • Transactions/MVCC unnecessary overhead
    • Expensive
  • The Read Problem
    • View 30 newest unread stories from blogs
      • Not RDBMS friendly, no early-out
      • PL/Python heap-merge hack helped
      • We knew what to do but DB didn’t listen
  • The Write Problem
    • Rapidly growing items table
      • Crawl index from 1k to 100k feeds
      • Indexes, static content, dynamic statistics
      • Solutions are imperfect
  • RDBMS Conclusions
    • Enormous functionality and flexibility
        • But you throw it out the door at scale
    • Stripped down RDBMS still not attractive
    • Turned entire team into DBAs
    • Gets in the way of domain-specific optimizations
  • What We Wanted
    • Transparent partitioning
    • Transparent distribution
    • Fast random writes
    • Good data locality
    • Fast random reads
  • What We Got
    • Transparent partitioning
    • Transparent distribution
    • Fast random writes
    • Good data locality
    • Fast random reads
    Regions RegionServers MemStore Column Families HBase 0.20
  • What Else We Got
    • Transparent replication
    • High availability
    • MapReduce
    • Versioning
    • Fast Sequential Reads
    HDFS No SPOF Input/OutputFormats Column Versions Scanners
  • HBase @ Streamy Today
  • HBase @ Streamy Today
    • All data stored in HBase
    • Additional caching of hot data
    • Query and indexing engines
    • MapReduce crawling and analytics
    • Zookeeper/Katta/Lucene
  • HBase @ Streamy Tomorrow
    • Thumbnail media server
    • Slave replication for Backup/DR
    • More Cascading
    • Better Katta integration
    • Realtime MapReduce
  • HBase on a Budget
    • HBase works on cheap nodes
        • But you need a cluster (5+ nodes)
        • $10k cluster has 10X capacity of $40k node
    • Multiple instances on a single cluster
    • 24/7 clusters + bandwidth != EC2
  • Lessons Learned
    • Layer of abstraction helps tremendously
        • Internal Streamy Data API
        • Storage of serialized types
    • Schema design is about reads not writes
    • What’s good for HBase is good for Streamy
  • What’s Next for HBase
    • Inter-cluster / Inter-DC replication
        • Slave and Multi-Master
    • Master rewrite, more Zookeeper
    • Batch operations, HDFS uploader
    • No more data loss
        • Need HDFS appends
  • HBase Information
    • Home Page http://hbase.org
    • Wiki http://wiki.apache.org/hadoop/Hbase
    • Twitter http://twitter.com/hbase
    • Freenode IRC #hbase
    • Mailing List [email_address]