Advertisement
Advertisement

More Related Content

More from HBaseCon(20)

Advertisement

HBase Design Patterns @ Yahoo!

  1. HBase Design Patterns @ Y! PRESENTED BY Francis Liu | toffer@apache.org⎪ May 5, 2014
  2. Y! Grid ▪ Off-Stage Processing ▪ Hosted Service ▪ Multi-tenant
  3. Batch Processing (with HDFS) ▪ Append-only ▪ Efficient full table scans ▪ Process entire data set (or partitions)
  4. HBase ▪ Mutable ▪ Point Access ▪ Range scans ▪ Record-level processing ▪ 7 clusters, 1500 nodes, 6PB
  5. Entity Store: Motivation ▪ Integrate data from multiple data sources ▪ Store historical data ▪ Share data › Analytics › Machine Learning › Consume a data source
  6. Entity Store ▪ Records as Entities › Web pages › Celebrities › etc. ▪ Denormalized as a single table
  7. Entity Store: Content Store Ingestion Service Sports Enrichment News Enrichment xxxxxxxxx c:content xxxxxxxx m:sports m:news Serving Bulk Ingest xxxxxxxxx c:message xxxxxxxx Feed
  8. Entity Store: Considerations ▪ Row vs multiple rows as an entity? › Row in most cases ▪ Blob vs Primitives as cell values? › Blobs are more compact › Primitives work better for granular updates › Out of the box filters work better with primitives › Use a compact binary format ▪ Prepare for Schema Changes › Provide a DAO library ▪ Incremental Scan › Batch id (via version) › Size cache for batch
  9. Event Processing: Motivation ▪ Process a stream of events › Ad Targeting › Personalization › etc. ▪ Low average age of a record/model/etc
  10. Event Processing ▪ Entity Store ▪ Incremental computation › Persist incremental state ▪ Stream processing framework › ie Storm ▪ Fit working set in Block Cache HBase StormData Collector Serving
  11. Event Processing: Ad Targeting Ad Targeting HBase MapReduce Storm HDFS Data Collector Index Batch Near realtime ServingProcessingCollection
  12. Event Processing - Considerations ▪ Limit large compactions ▪ Deferred log flush ▪ Avoid compaction storms ▪ Async Access › HBase work queue › AsyncHBase ▪ Blobs when possible ▪ Cache optimizations
  13. Phased Event Processing: Motivation ▪ Large/Complex event pipeline ▪ Modularization ▪ Dependency between pipelines
  14. Phased Event Processing ▪ Notifications › Separate Table › Separate Column Family Topology1 Data Collector Notifications Topology2 Notifications Notifications Topology3 Serving
  15. Phased Event Processing: Personalization Data Collector Notifications Enrichment Notifications Ingestion Serving MapReduceHDFS HBase ServingProcessingCollection Fetcher
  16. Phased Event Processing: Considerations ▪ Notifications › Ordered › At least once ▪ Write to multiple regions ▪ Transactions
  17. Time Series DB: Motivation ▪ Track/Monitor changes over time › Application Metrics › User Analytics › System Metrics › etc. ▪ Alerts/Alarms › Thresholds › Changes over time
  18. Time Series DB: Personalization Data Quality HBase StormData Collector Web UI Serving
  19. Time-Series: Considerations ▪ Hot metrics › Namespace › Indexed tags ▪ Pre-compute aggregates if it is accessed often ▪ Consider using a block encoding scheme (PREFIX, FAST_DIFF, etc) ▪ Consider pre-computed aggregates in a separate table ▪ Consider OpenTSDB
  20. HBaseCon 2014 Thank You! (We’re hiring)
Advertisement