Your SlideShare is downloading. ×
0
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Hadoop and h base in the real world
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop and h base in the real world

101

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
101
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Hadoop and HBase in the Real World Joey Echeverria @fwiffo #novahug
  • 2. The PlugWere Training! Developer Training July 25 to 27 Admin Training July 28 to 29 http://www.cloudera.com/trainingWe’re Hiring! Solution Architects, Trainers, Distributed Systems Engineers http://www.cloudera.com/careers Copyright 2011 Cloudera Inc. All rights reserved 2
  • 3. 1 Minute Hadoop RecapHDFSDistributed file systemOptimized for streaming reads and writesBlock-level replicationMapReduceDistributed processing frameworkReads/writes data in HDFS (typically)Operates over (key, value) view of data Copyright 2011 Cloudera Inc. All rights reserved 3
  • 4. Where does HBase come in?GoogleGoogle invented GFS and MapReduceGFS optimized for streaming reads and writesBigTableGoogles answer to random read/write workloads Copyright 2011 Cloudera Inc. All rights reserved 4
  • 5. HBase: BigTable-like storage (for Hadoop) Copyright 2011 Cloudera Inc. All rights reserved 5
  • 6. What is HBase?Key/value column family storeData stored in HDFSZooKeeper for coordinationAccess model is get/put/delPlus range scans and versions Copyright 2011 Cloudera Inc. All rights reserved 6
  • 7. ArchitectureImage courtsey Lars George, Licensed uner Creative Commons Attribution-Noncomm Copyright 2011 Cloudera Inc. All rights reserved 7
  • 8. Tables and Column FamiliesStatic part of the schemaColumn families also form locality groupsOne Store per familyMultiple HFiles per StoreTables split into regionsContinuous range of row keysUnit of distributionAutomatically splitPre-split for performance Copyright 2011 Cloudera Inc. All rights reserved 8
  • 9. Why use HBase?Variable schema in each recordCollections of data for each keyAtomic control of per-key dataRow access to each column family Copyright 2011 Cloudera Inc. All rights reserved 9
  • 10. HBase Applications“Smart Data, at Scale, made Easy” http://www.lilyproject.org“Distributed, scalable Time Series Database (TSDB)” http://opentsdb.net Copyright 2011 Cloudera Inc. All rights reserved 10
  • 11. Real-time ad optimizationsCapturing impressions and serving adsHBase front-end – to serve models (via memcached)HBase back-end – to serve pixels and capture cookiesMapReduce to compute models between the two Copyright 2011 Cloudera Inc. All rights reserved 11
  • 12. Click stream sessionizationKey on userid and timeSeperate table for significant events (e.g. purchase)Load data using HBase importtsv toolSessionization performed by simple scans Copyright 2011 Cloudera Inc. All rights reserved 12
  • 13. Mozilla - SoccorroWhen Firefox crashes, where do reports go?The Mozilla team gathers those crashes in HBaseCrashes varry widely and change format oftenProcessors take each individually and parse it out http://code.google.com/p/socorro Copyright 2011 Cloudera Inc. All rights reserved 13
  • 14. NavteqLocation based content servingAll served out of HBase, location makes a great keyContent is variable – Maps, POI, User DataPreprocessing is all done via MR jobs Copyright 2011 Cloudera Inc. All rights reserved 14
  • 15. ClouderaGathers data about customer clustersEach customer node is a key with Avro valuesEasy to browse, quick to find issues on NodesDump to HDFS and process with Pig Copyright 2011 Cloudera Inc. All rights reserved 15
  • 16. Copyright 2011 Cloudera Inc. All rights reserved 16

×