• Like
  • Save

Hadoop and HBase in the Real World

  • 3,535 views
Uploaded on

Cloudera Solutions Architect, …

Cloudera Solutions Architect,
Joey Echeverria, explains Hadoop and HBases architecture and roles in the real world of data management and storage.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,535
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Hadoop and HBase inthe Real World Joey Echeverria @fwiffo #novahug
  • 2. The PlugWere Training! Developer Training July 25 to 27 Admin Training July 28 to 29 http://www.cloudera.com/trainingWe’re Hiring! Solution Architects, Trainers, Distributed Systems Engineers http://www.cloudera.com/careers Copyright 2011 Cloudera Inc. All rights reserved 2
  • 3. 1 Minute Hadoop RecapHDFS Distributed file system Optimized for streaming reads and writes Block-level replicationMapReduce Distributed processing framework Reads/writes data in HDFS (typically) Operates over (key, value) view of data Copyright 2011 Cloudera Inc. All rights reserved 3
  • 4. Where does HBase come in?Google Google invented GFS and MapReduce GFS optimized for streaming reads and writesBigTable Googles answer to random read/write workloads Copyright 2011 Cloudera Inc. All rights reserved 4
  • 5. HBase: BigTable-like storage (for Hadoop) Copyright 2011 Cloudera Inc. All rights reserved 5
  • 6. What is HBase?Key/value column family storeData stored in HDFSZooKeeper for coordinationAccess model is get/put/delPlus range scans and versions Copyright 2011 Cloudera Inc. All rights reserved 6
  • 7. ArchitectureImage courtsey Lars George, Licensed uner Creative CommonsAttribution-Noncommercial-Share Alike 3.0 Germany License. Copyright 2011 Cloudera Inc. All rights reserved 7
  • 8. Tables and Column FamiliesStatic part of the schemaColumn families also form locality groups One Store per family Multiple HFiles per StoreTables split into regions Continuous range of row keys Unit of distribution Automatically split Pre-split for performance Copyright 2011 Cloudera Inc. All rights reserved 8
  • 9. Why use HBase?Variable schema in each recordCollections of data for each keyAtomic control of per-key dataRow access to each column family Copyright 2011 Cloudera Inc. All rights reserved 9
  • 10. HBase Applications “Smart Data, at Scale, made Easy” http://www.lilyproject.org “Distributed, scalable Time Series Database (TSDB)” http://opentsdb.net Copyright 2011 Cloudera Inc. All rights reserved 10
  • 11. Real-time ad optimizationsCapturing impressions and serving adsHBase front-end – to serve models (via memcached)HBase back-end – to serve pixels and capture cookiesMapReduce to compute models between the two Copyright 2011 Cloudera Inc. All rights reserved 11
  • 12. Click stream sessionizationKey on userid and timeSeperate table for significant events (e.g. purchase)Load data using HBase importtsv toolSessionization performed by simple scans Copyright 2011 Cloudera Inc. All rights reserved 12
  • 13. Mozilla - SoccorroWhen Firefox crashes, where do reports go?The Mozilla team gathers those crashes in HBaseCrashes varry widely and change format oftenProcessors take each individually and parse it out http://crash-stats.mozilla.com http://code.google.com/p/socorro Copyright 2011 Cloudera Inc. All rights reserved 13
  • 14. NavteqLocation based content servingAll served out of HBase, location makes a great keyContent is variable – Maps, POI, User DataPreprocessing is all done via MR jobs Copyright 2011 Cloudera Inc. All rights reserved 14
  • 15. ClouderaGathers data about customer clustersEach customer node is a key with Avro valuesEasy to browse, quick to find issues on NodesDump to HDFS and process with Pig Copyright 2011 Cloudera Inc. All rights reserved 15
  • 16. Copyright 2011 Cloudera Inc. All rights reserved 16