SlideShare a Scribd company logo
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Building a Data “Development” Platform
Data Evolution In HBase
Eric Czech & Alec Zopf
Next Big Sound
!
HBaseCon - Case Studies Track
May 5, 2014
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Intro
• Eric Czech - Chief Architect
Previously worked for infrastructure team at
quantitative hedge fund
!
• Alec Zopf - Senior Data Engineer
Previously worked on algorithmic futures and
options trading platform
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Agenda
• Data & Architecture
• Data Aggregation
- Why no tools help us
• Data Development (HBlocks)
- Our platform for making it happen
• A Practical Example
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Misc
iTunes

Physical Sales 

Amazon

Sitecatalyst
Facebook 

Facebook Insights

Last.fm 

Pandora 

Rdio

ReverbNation 

SoundCloud

Tumblr
Streaming & SocialNext Big Sound marries billions of public social
data points with customers’ internal transactional
data. Public sources include up to 3+ years of
historical and competitive data for hundreds of
thousands of artists and millions of songs.
Google Analytics 

Wikipedia

Tunesat

Mediabase
Sales
Spotify 

Twitter 

Vevo

Vimeo 

YouTube 

YouTube Analytics

Deezer

Instagram
Data Sources
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Charts Licensed to Billboard
In Billboard’s 118 year history
they’ve licensed data from two
providers – Nielsen in 1991 and
Next Big Sound in 2010.
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Architecture & Stats
•Data collected from 60+ sources
•1M artists, 10M tracks
•10s of billions of records
•CDH 4.3.0
•48 node Hadoop cluster for 35TB dataset
•No licensing costs
•Giant counting machine!
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Data Aggregation
Stores raw fact tables and copies of
dimension tables from MySQL
HDFS
Oozie/Pig
HBase
Runs incremental joins of fact and
dimension tables
Stores timeseries aggregations for
random access (NOT using counters)
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Raw Fact Data (HDFS)
Aggregate Tables (HBase)
Cube/Rollup Operations (Pig)
(and many more...)
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Other Solutions
• OpenTSDB
• Summingbird (Twitter)
• DataFu Hourglass (Linkedin)
• Blueflood (Rackspace)
• Oozie Coordinators
• Apache Accumulo
Are there better ways to just count things?
Yes! Lots:
• Hadoop + Voldemort
• MongoDB Incremental MapReduce
• TempoDB & InfluxDB (hosted services)
• KairosDB (originally built on Cassandra)
• Amazon EMR/Redshift
• Cassandra/Redis/Riak/HBase Counters
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Considerations
• Scalability
• Cost
• Performance
• Client Libraries
• I/O Characteristics
• Optimal Hardware
• Config Overhead
• Language
• Community
• Data Model
• Monitoring/Alerting
• Documentation
• Support
• Learning Curve
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
One More Thing..
What about mistakes?!
Data “bugs” are nearly impossible to predict
and can screw you in unimaginable ways..
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Data Bugs
Why are fan counts in Schenectady, NY 1000% higher than everywhere else?
Data source uses 12345 as default for new users’ locations
Why are radio station play numbers recently all multiples of 2 or 3?
Data delivered several times and we had no idea
Why is the number of songs sold 3% too high?
We didn't account for returns
Why are all the page view spikes 8 hours after they should be?
We assumed UTC timestamps instead of PST
Hundreds of these! .. that we know of
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Minor Data Bugs
Georgia
!=
Georgia
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Or maybe not...
Can we just fix the code and re-aggregate?
NO, there’s no guarantee that the bad data is overwritten.
Can we do the aggregations “on-the-fly”?
NO, we’re not using a relational model for good reason.
Can we rebuild everything in new tables?
NO, we’d need 2x storage to fix < .0001% of the data.
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Fixing data bugs online is terrifying.
• Dangerous and complicated
• Difficult to generalize
• Time-consuming to test
• A huge database I/O burden
“Ad-hoc” updates to production datasets are:
Learning the Hard Way
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Back To Solutions
What if each dataset had multiple versions?
... and we can focus on small pieces
... with alpha/beta/stable tags
... where users only see what they should
Feels familiar
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
HBlocks
• Spans HDFS, Hive, Pig, and HBase
• Arbitrary versioning of data subsets
• Incremental processing, full-scale re-processing,
and everything in between
• Append-only model (deletes in background)
Our solution for large-scale revision control
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
The Basics
Each raw file has an ID
* e.g “block_1”
Each ID has versions
* ID & version stored in HBase
Version state used
to filter results
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Data Development
Version “States” control data lifecycle
PENDING New data for ETL pipeline
PROCESSING Data currently being processed
ALPHA Developers only
BETA Privileged users
STABLE Everybody
HIDDEN Ignored (but still in HBase)
DELETED Removed permanently
Birth
Death
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
A Practical Example
Tracking the number of English Language
Wikipedia page views for Hadoop
http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/
http://en.wikipedia.org/wiki/Apache_Hadoop
So we’ll track this site:
Using this data:
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
The Dataset
http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/
Contains ~100MB compressed files for each hour
pagecounts-20140101-*.gzAll pageviews for Jan 1, 2014:
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
File Uploads
user@host001> for file in `ls wikipedia`!
do!
hblocks upload !
-file $file !
-source wikipedia !
done
user@host001> ls wikipedia!
pagecounts-20140101.gz!
pagecounts-20140102.gz!
...!
pagecounts-20140131.gz
Files downloaded
anywhere ...
... and uploaded
to HDFS
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
File Metadata
user@host001> hblocks list -source wikipedia !
+---------------------------------------------------------+!
| hblock_id | hblock_name | source | version:1 |!
+---------------------------------------------------------+!
| 2935 | pagecounts-20140101 | wikipedia | PENDING |!
| 2936 | pagecounts-20140102 | wikipedia | PENDING |!
...!
| 3678 | pagecounts-20140131 | wikipedia | PENDING |!
+---------------------------------------------------------+!
Table contains 31 row(s)
HDFS files registered in HBlocks metadata:
“PENDING” state indicates
availability for Pig scripts
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Run It!
Now, lets do some aggregating:
user@host001> hblocks aggregate -source wikipedia
user@host001> hblocks query -table page_views !
+-------------------------------------------------------------------+!
| hblock_id | version | language | page | date | value |!
+-------------------------------------------------------------------+!
| 2935 | 1 | en | Apache_Hadoop | 20140101 | 283 |!
...!
| 2935 | 1 | En | Apache_Hadoop | 20140131 | 2 |!
| 2935 | 1 | en.mw | Apache_Hadoop | 20140131 | 3 |
Pig script writes results to HBase:
Wtf is this !?
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
What Happened?
• “Sub” languages (e.g. ‘en.mw’) introduced
• Capitalized languages (e.g. ‘En’) also added
• Aggregation script starts ignoring small % of records
On January 20th:
* fictitious problems - these language values are real but were
not introduced in January
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Effects Over Time
Aggregation process misses new
languages causing slight drop in values
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Fix It!
Create new versions for each file affected:
user@host001> hblocks rebuild -source wikipedia -regex ‘.*201401(2|3).*’
Old versions “STABLE”, new versions “PENDING”:
user@host001> hblocks list -source wikipedia !
+---------------------------------------------------------------------+!
| hblock_id | hblock_name | source | version:1 | version:2 |!
+---------------------------------------------------------------------+!
| 2935 | pagecounts-20140101 | wikipedia | STABLE | |!
| 2935 | pagecounts-20140102 | wikipedia | STABLE | |!
...!
| 2936 | pagecounts-20140120 | wikipedia | STABLE | PENDING |!
| 2936 | pagecounts-20140121 | wikipedia | STABLE | PENDING |!
...!
| 3678 | pagecounts-20140131 | wikipedia | STABLE | PENDING |!
+---------------------------------------------------------------------+
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Fix It!
Change the current aggregation code:
String language = line.get(“language”);
To handle case-sensitivity and use first part before a “.”:
String language = line.get(“language”)!
! .split(“.”)[1]!
! .toLowerCase();
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Run It Again
Run the same aggregation for new versions:
user@host001> hblocks aggregate -source wikipedia
New results:
We made it
even worse!
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Revert
Hurry, hide the bad data:
.split(“.”)[1]
Wrong! Should have been:
.split(“.”)[0]
user@host001> hblocks update_versions -source wikipedia !
! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘HIDDEN’
Phew, back to where we started .. but what happened?
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Fix It Again (carefully)
user@host001> hblocks rebuild -source wikipedia !
! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘beta’
Rebuild aggregations in ‘beta’ state this time:
hblocks aggregateAfter another only developers see:
Looks good!
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Finishing Up
Make the new data available for ALL users:
Final state:
user@host001> hblocks update_versions -source wikipedia !
! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘ACTIVE’
user@host001> hblocks list -source wikipedia !
+---------------------------------------------------------------------------------+!
| hblock_id | hblock_name | source | version:1 | version:2 | version:3 |!
+---------------------------------------------------------------------------------+!
| 2935 | pagecounts-20140101 | wikipedia | STABLE | | |!
| 2935 | pagecounts-20140102 | wikipedia | STABLE | | |!
... !
| 2936 | pagecounts-20140120 | wikipedia | HIDDEN | HIDDEN | STABLE |!
| 2936 | pagecounts-20140121 | wikipedia | HIDDEN | HIDDEN | STABLE |!
...!
| 3678 | pagecounts-20140131 | wikipedia | HIDDEN | HIDDEN | STABLE |!
+---------------------------------------------------------------------------------+
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
HBase Schema
Primary Dimensions
HBlock Id
Time 0 Secondary Dimensions
Time 1 HBlockVersion Id
Time 2.0 Value0 Time 2.N Value N
Keys
Columns
Values
Timestamps Schema #Insertion Time (secs) Value Data Type
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
HBase Keys/Columns
Primary Dimensions
HBlock Id
Time 0 Secondary Dimensions
Time 1 HBlockVersion Id
Keys
Columns
Concatenated string ids
artists, tracks & metrics
Times split into offsets
limits row width
Queried in bulk
demographics & zip codes
HBlocks metadata
determines record “state”
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
HBase Values
Time 2.0 Value0 Time 2.N Value NValues
Time offsets in values too
fixed width (single byte)
Values stored as VarInts
can be any width
Many values per cell keeps key count
lower, reducing MemStore size
* difficult without an append-only model like ours
®
eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
Alec Zopf
alec@nextbigsound.com
Eric Czech
eric@nextbigsound.com
Architecture @ NBS - highscalability.com
HBlocks White PaperJobs @ NBS
Links

More Related Content

What's hot

HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
HBaseCon
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Cloudera, Inc.
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
asterix_smartplatf
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
HBaseCon
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
Cloudera, Inc.
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 

What's hot (20)

HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 

Viewers also liked

Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
Hanborq Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
HBaseCon
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Cloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon
 

Viewers also liked (20)

Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to Contribute
 

Similar to Data Evolution in HBase

Updates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&DUpdates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&D
Hiromu Hota
 
Unveiling FME 2014 – A Live Event
Unveiling FME 2014 – A Live EventUnveiling FME 2014 – A Live Event
Unveiling FME 2014 – A Live Event
Safe Software
 
Deep Dive into FME Desktop 2014
Deep Dive into FME Desktop 2014Deep Dive into FME Desktop 2014
Deep Dive into FME Desktop 2014
Safe Software
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
Raúl Marín
 
Analyzing Hadoop Using Hadoop
Analyzing Hadoop Using HadoopAnalyzing Hadoop Using Hadoop
Analyzing Hadoop Using Hadoop
DataWorks Summit
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
DataWorks Summit/Hadoop Summit
 
Carbonite HA for Azure Stacks.pptx
Carbonite HA for Azure Stacks.pptxCarbonite HA for Azure Stacks.pptx
Carbonite HA for Azure Stacks.pptx
BenAissaTaher1
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
nadine39280
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
markgrover
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
Amazon Web Services
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Kai Wähner
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
iwrigley
 
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment WorkloadsCMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
Amazon Web Services
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
Skills Matter
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
DataCore Software
 

Similar to Data Evolution in HBase (20)

Updates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&DUpdates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&D
 
Unveiling FME 2014 – A Live Event
Unveiling FME 2014 – A Live EventUnveiling FME 2014 – A Live Event
Unveiling FME 2014 – A Live Event
 
Deep Dive into FME Desktop 2014
Deep Dive into FME Desktop 2014Deep Dive into FME Desktop 2014
Deep Dive into FME Desktop 2014
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Analyzing Hadoop Using Hadoop
Analyzing Hadoop Using HadoopAnalyzing Hadoop Using Hadoop
Analyzing Hadoop Using Hadoop
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Carbonite HA for Azure Stacks.pptx
Carbonite HA for Azure Stacks.pptxCarbonite HA for Azure Stacks.pptx
Carbonite HA for Azure Stacks.pptx
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
 
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment WorkloadsCMP213_GPU(G3) Applications in Media and Entertainment Workloads
CMP213_GPU(G3) Applications in Media and Entertainment Workloads
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Hadoop uk user group meeting final
Hadoop uk user group meeting finalHadoop uk user group meeting final
Hadoop uk user group meeting final
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Recently uploaded

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 

Recently uploaded (20)

Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 

Data Evolution in HBase

  • 1. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Building a Data “Development” Platform Data Evolution In HBase Eric Czech & Alec Zopf Next Big Sound ! HBaseCon - Case Studies Track May 5, 2014
  • 2. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Intro • Eric Czech - Chief Architect Previously worked for infrastructure team at quantitative hedge fund ! • Alec Zopf - Senior Data Engineer Previously worked on algorithmic futures and options trading platform
  • 3. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Agenda • Data & Architecture • Data Aggregation - Why no tools help us • Data Development (HBlocks) - Our platform for making it happen • A Practical Example
  • 4. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Misc iTunes Physical Sales Amazon Sitecatalyst Facebook Facebook Insights Last.fm Pandora Rdio ReverbNation SoundCloud Tumblr Streaming & SocialNext Big Sound marries billions of public social data points with customers’ internal transactional data. Public sources include up to 3+ years of historical and competitive data for hundreds of thousands of artists and millions of songs. Google Analytics Wikipedia Tunesat Mediabase Sales Spotify Twitter Vevo Vimeo YouTube YouTube Analytics Deezer Instagram Data Sources
  • 5. eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
  • 6. eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc.
  • 7. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Charts Licensed to Billboard In Billboard’s 118 year history they’ve licensed data from two providers – Nielsen in 1991 and Next Big Sound in 2010.
  • 8. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Architecture & Stats •Data collected from 60+ sources •1M artists, 10M tracks •10s of billions of records •CDH 4.3.0 •48 node Hadoop cluster for 35TB dataset •No licensing costs •Giant counting machine!
  • 9. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Data Aggregation Stores raw fact tables and copies of dimension tables from MySQL HDFS Oozie/Pig HBase Runs incremental joins of fact and dimension tables Stores timeseries aggregations for random access (NOT using counters)
  • 10. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Raw Fact Data (HDFS) Aggregate Tables (HBase) Cube/Rollup Operations (Pig) (and many more...)
  • 11. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Other Solutions • OpenTSDB • Summingbird (Twitter) • DataFu Hourglass (Linkedin) • Blueflood (Rackspace) • Oozie Coordinators • Apache Accumulo Are there better ways to just count things? Yes! Lots: • Hadoop + Voldemort • MongoDB Incremental MapReduce • TempoDB & InfluxDB (hosted services) • KairosDB (originally built on Cassandra) • Amazon EMR/Redshift • Cassandra/Redis/Riak/HBase Counters
  • 12. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Considerations • Scalability • Cost • Performance • Client Libraries • I/O Characteristics • Optimal Hardware • Config Overhead • Language • Community • Data Model • Monitoring/Alerting • Documentation • Support • Learning Curve
  • 13. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. One More Thing.. What about mistakes?! Data “bugs” are nearly impossible to predict and can screw you in unimaginable ways..
  • 14. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Data Bugs Why are fan counts in Schenectady, NY 1000% higher than everywhere else? Data source uses 12345 as default for new users’ locations Why are radio station play numbers recently all multiples of 2 or 3? Data delivered several times and we had no idea Why is the number of songs sold 3% too high? We didn't account for returns Why are all the page view spikes 8 hours after they should be? We assumed UTC timestamps instead of PST Hundreds of these! .. that we know of
  • 15. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Minor Data Bugs Georgia != Georgia
  • 16. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Or maybe not... Can we just fix the code and re-aggregate? NO, there’s no guarantee that the bad data is overwritten. Can we do the aggregations “on-the-fly”? NO, we’re not using a relational model for good reason. Can we rebuild everything in new tables? NO, we’d need 2x storage to fix < .0001% of the data.
  • 17. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Fixing data bugs online is terrifying. • Dangerous and complicated • Difficult to generalize • Time-consuming to test • A huge database I/O burden “Ad-hoc” updates to production datasets are: Learning the Hard Way
  • 18. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Back To Solutions What if each dataset had multiple versions? ... and we can focus on small pieces ... with alpha/beta/stable tags ... where users only see what they should Feels familiar
  • 19. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. HBlocks • Spans HDFS, Hive, Pig, and HBase • Arbitrary versioning of data subsets • Incremental processing, full-scale re-processing, and everything in between • Append-only model (deletes in background) Our solution for large-scale revision control
  • 20. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. The Basics Each raw file has an ID * e.g “block_1” Each ID has versions * ID & version stored in HBase Version state used to filter results
  • 21. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Data Development Version “States” control data lifecycle PENDING New data for ETL pipeline PROCESSING Data currently being processed ALPHA Developers only BETA Privileged users STABLE Everybody HIDDEN Ignored (but still in HBase) DELETED Removed permanently Birth Death
  • 22. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. A Practical Example Tracking the number of English Language Wikipedia page views for Hadoop http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/ http://en.wikipedia.org/wiki/Apache_Hadoop So we’ll track this site: Using this data:
  • 23. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. The Dataset http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/ Contains ~100MB compressed files for each hour pagecounts-20140101-*.gzAll pageviews for Jan 1, 2014:
  • 24. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. File Uploads user@host001> for file in `ls wikipedia`! do! hblocks upload ! -file $file ! -source wikipedia ! done user@host001> ls wikipedia! pagecounts-20140101.gz! pagecounts-20140102.gz! ...! pagecounts-20140131.gz Files downloaded anywhere ... ... and uploaded to HDFS
  • 25. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. File Metadata user@host001> hblocks list -source wikipedia ! +---------------------------------------------------------+! | hblock_id | hblock_name | source | version:1 |! +---------------------------------------------------------+! | 2935 | pagecounts-20140101 | wikipedia | PENDING |! | 2936 | pagecounts-20140102 | wikipedia | PENDING |! ...! | 3678 | pagecounts-20140131 | wikipedia | PENDING |! +---------------------------------------------------------+! Table contains 31 row(s) HDFS files registered in HBlocks metadata: “PENDING” state indicates availability for Pig scripts
  • 26. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Run It! Now, lets do some aggregating: user@host001> hblocks aggregate -source wikipedia user@host001> hblocks query -table page_views ! +-------------------------------------------------------------------+! | hblock_id | version | language | page | date | value |! +-------------------------------------------------------------------+! | 2935 | 1 | en | Apache_Hadoop | 20140101 | 283 |! ...! | 2935 | 1 | En | Apache_Hadoop | 20140131 | 2 |! | 2935 | 1 | en.mw | Apache_Hadoop | 20140131 | 3 | Pig script writes results to HBase: Wtf is this !?
  • 27. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. What Happened? • “Sub” languages (e.g. ‘en.mw’) introduced • Capitalized languages (e.g. ‘En’) also added • Aggregation script starts ignoring small % of records On January 20th: * fictitious problems - these language values are real but were not introduced in January
  • 28. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Effects Over Time Aggregation process misses new languages causing slight drop in values
  • 29. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Fix It! Create new versions for each file affected: user@host001> hblocks rebuild -source wikipedia -regex ‘.*201401(2|3).*’ Old versions “STABLE”, new versions “PENDING”: user@host001> hblocks list -source wikipedia ! +---------------------------------------------------------------------+! | hblock_id | hblock_name | source | version:1 | version:2 |! +---------------------------------------------------------------------+! | 2935 | pagecounts-20140101 | wikipedia | STABLE | |! | 2935 | pagecounts-20140102 | wikipedia | STABLE | |! ...! | 2936 | pagecounts-20140120 | wikipedia | STABLE | PENDING |! | 2936 | pagecounts-20140121 | wikipedia | STABLE | PENDING |! ...! | 3678 | pagecounts-20140131 | wikipedia | STABLE | PENDING |! +---------------------------------------------------------------------+
  • 30. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Fix It! Change the current aggregation code: String language = line.get(“language”); To handle case-sensitivity and use first part before a “.”: String language = line.get(“language”)! ! .split(“.”)[1]! ! .toLowerCase();
  • 31. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Run It Again Run the same aggregation for new versions: user@host001> hblocks aggregate -source wikipedia New results: We made it even worse!
  • 32. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Revert Hurry, hide the bad data: .split(“.”)[1] Wrong! Should have been: .split(“.”)[0] user@host001> hblocks update_versions -source wikipedia ! ! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘HIDDEN’ Phew, back to where we started .. but what happened?
  • 33. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Fix It Again (carefully) user@host001> hblocks rebuild -source wikipedia ! ! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘beta’ Rebuild aggregations in ‘beta’ state this time: hblocks aggregateAfter another only developers see: Looks good!
  • 34. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Finishing Up Make the new data available for ALL users: Final state: user@host001> hblocks update_versions -source wikipedia ! ! ! ! ! -regex ‘.*201401(2|3).*’ -state ‘ACTIVE’ user@host001> hblocks list -source wikipedia ! +---------------------------------------------------------------------------------+! | hblock_id | hblock_name | source | version:1 | version:2 | version:3 |! +---------------------------------------------------------------------------------+! | 2935 | pagecounts-20140101 | wikipedia | STABLE | | |! | 2935 | pagecounts-20140102 | wikipedia | STABLE | | |! ... ! | 2936 | pagecounts-20140120 | wikipedia | HIDDEN | HIDDEN | STABLE |! | 2936 | pagecounts-20140121 | wikipedia | HIDDEN | HIDDEN | STABLE |! ...! | 3678 | pagecounts-20140131 | wikipedia | HIDDEN | HIDDEN | STABLE |! +---------------------------------------------------------------------------------+
  • 35. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. HBase Schema Primary Dimensions HBlock Id Time 0 Secondary Dimensions Time 1 HBlockVersion Id Time 2.0 Value0 Time 2.N Value N Keys Columns Values Timestamps Schema #Insertion Time (secs) Value Data Type
  • 36. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. HBase Keys/Columns Primary Dimensions HBlock Id Time 0 Secondary Dimensions Time 1 HBlockVersion Id Keys Columns Concatenated string ids artists, tracks & metrics Times split into offsets limits row width Queried in bulk demographics & zip codes HBlocks metadata determines record “state”
  • 37. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. HBase Values Time 2.0 Value0 Time 2.N Value NValues Time offsets in values too fixed width (single byte) Values stored as VarInts can be any width Many values per cell keeps key count lower, reducing MemStore size * difficult without an append-only model like ours
  • 38. ® eric@nextbigsound.com© 2009 - 2014 Next Big Sound, Inc. Alec Zopf alec@nextbigsound.com Eric Czech eric@nextbigsound.com Architecture @ NBS - highscalability.com HBlocks White PaperJobs @ NBS Links