• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop and HBase in the Real World
 

Hadoop and HBase in the Real World

on

  • 3,911 views

Cloudera Solutions Architect,

Cloudera Solutions Architect,
Joey Echeverria, explains Hadoop and HBases architecture and roles in the real world of data management and storage.

Statistics

Views

Total Views
3,911
Views on SlideShare
3,626
Embed Views
285

Actions

Likes
7
Downloads
0
Comments
0

4 Embeds 285

http://www.cloudera.com 271
http://blog.cloudera.com 11
http://www.slideshare.net 2
http://test.cloudera.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop and HBase in the Real World Hadoop and HBase in the Real World Presentation Transcript

    • Apache Hadoop and HBase inthe Real World Joey Echeverria @fwiffo #novahug
    • The PlugWere Training! Developer Training July 25 to 27 Admin Training July 28 to 29 http://www.cloudera.com/trainingWe’re Hiring! Solution Architects, Trainers, Distributed Systems Engineers http://www.cloudera.com/careers Copyright 2011 Cloudera Inc. All rights reserved 2
    • 1 Minute Hadoop RecapHDFS Distributed file system Optimized for streaming reads and writes Block-level replicationMapReduce Distributed processing framework Reads/writes data in HDFS (typically) Operates over (key, value) view of data Copyright 2011 Cloudera Inc. All rights reserved 3
    • Where does HBase come in?Google Google invented GFS and MapReduce GFS optimized for streaming reads and writesBigTable Googles answer to random read/write workloads Copyright 2011 Cloudera Inc. All rights reserved 4
    • HBase: BigTable-like storage (for Hadoop) Copyright 2011 Cloudera Inc. All rights reserved 5
    • What is HBase?Key/value column family storeData stored in HDFSZooKeeper for coordinationAccess model is get/put/delPlus range scans and versions Copyright 2011 Cloudera Inc. All rights reserved 6
    • ArchitectureImage courtsey Lars George, Licensed uner Creative CommonsAttribution-Noncommercial-Share Alike 3.0 Germany License. Copyright 2011 Cloudera Inc. All rights reserved 7
    • Tables and Column FamiliesStatic part of the schemaColumn families also form locality groups One Store per family Multiple HFiles per StoreTables split into regions Continuous range of row keys Unit of distribution Automatically split Pre-split for performance Copyright 2011 Cloudera Inc. All rights reserved 8
    • Why use HBase?Variable schema in each recordCollections of data for each keyAtomic control of per-key dataRow access to each column family Copyright 2011 Cloudera Inc. All rights reserved 9
    • HBase Applications “Smart Data, at Scale, made Easy” http://www.lilyproject.org “Distributed, scalable Time Series Database (TSDB)” http://opentsdb.net Copyright 2011 Cloudera Inc. All rights reserved 10
    • Real-time ad optimizationsCapturing impressions and serving adsHBase front-end – to serve models (via memcached)HBase back-end – to serve pixels and capture cookiesMapReduce to compute models between the two Copyright 2011 Cloudera Inc. All rights reserved 11
    • Click stream sessionizationKey on userid and timeSeperate table for significant events (e.g. purchase)Load data using HBase importtsv toolSessionization performed by simple scans Copyright 2011 Cloudera Inc. All rights reserved 12
    • Mozilla - SoccorroWhen Firefox crashes, where do reports go?The Mozilla team gathers those crashes in HBaseCrashes varry widely and change format oftenProcessors take each individually and parse it out http://crash-stats.mozilla.com http://code.google.com/p/socorro Copyright 2011 Cloudera Inc. All rights reserved 13
    • NavteqLocation based content servingAll served out of HBase, location makes a great keyContent is variable – Maps, POI, User DataPreprocessing is all done via MR jobs Copyright 2011 Cloudera Inc. All rights reserved 14
    • ClouderaGathers data about customer clustersEach customer node is a key with Avro valuesEasy to browse, quick to find issues on NodesDump to HDFS and process with Pig Copyright 2011 Cloudera Inc. All rights reserved 15
    • Copyright 2011 Cloudera Inc. All rights reserved 16