HBaseCon 2013: 1500 JIRAs in 20 Minutes

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
1500 JIRAs in 20
Minutes
The Evolution of HBase, 2012-2013
Ian Varley, Salesforce.com
@thefutureian
It's been a year since the
first HBaseCon.
What's changed?
It's been a year since the
first HBaseCon.
What's changed?
(besides my beard length)
One lens on the evolution of
HBase is through JIRA
(issue tracking system).
HBase has a lot of activity.
HBase has a lot of activity.
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Opened in last year: ~2500
Total JIRAs, all time: ~8700
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBase has a lot of activity.
Opened in last year: ~2500
Fixed in last year: 1638
Total JIRAs, all time: ~8700
HBase has a lot of activity.
Opened in last year: ~2500
Fixed in last year: 1638
Total JIRAs, all time: ~8700
resolved >= 2012-05-23
AND resolved <= 2013-05-24
AND resolution in (Fixed, Implemented)
So we're going to talk about
them all. One by one.
HBaseCon 2013: 1500 JIRAs in 20 Minutes
We need to narrow it down.
First, let's get rid of the non-
functional changes:
First, let's get rid of the non-
functional changes:
Test: 307
First, let's get rid of the non-
functional changes:
Test:
Build:
307
55
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
307
55
107
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
503(some overlap)
Total:
First, let's get rid of the non-
functional changes:
Test:
Build:
Doc:
Ports:
307
55
107
62
503(some overlap)
"test", "junit", etc.
"pom", "classpath", "mvn", "build", etc.
"book", "[site]", "[refGuide]", "javadoc", etc.
"backport", "forward port", etc.
Total:
That leaves 1135 functional
changes to go over.
(In 18 minutes.)
Break what's left into 2 parts:
• Big Topics (20+ JIRAs on same issue)
• Indie Hits (Cool for some other reason)
Top 10 "big topics":
Top 10 "big topics":
Snapshots:
Top 10 "big topics":
82
Snapshots:
Replication:
Top 10 "big topics":
82
58
Snapshots:
Replication:
Compaction:
Top 10 "big topics":
82
58
54
Snapshots:
Replication:
Compaction:
Metrics:
Top 10 "big topics":
82
58
54
53
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Top 10 "big topics":
82
58
54
53
44
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
82
58
54
53
44
37
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
82
58
54
53
44
37
34
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
82
58
54
53
44
37
34
28
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
82
58
54
53
44
37
34
28
23
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
416(some overlap)
(305 functional, 111 non-functional)
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics:
Assignment:
Hadoop 2:
Protobufs:
Security:
Bulk Loading:
Modularization:
82
58
54
53
44
37
34
28
23
21
416(some overlap)
(305 functional, 111 non-functional)
Let's dive in to the top 3.
Snapshots
The gist: Take advantage of the fact that files in HDFS are already immutable
to get fast "snapshots" of tables that you can roll back to. This is pretty tricky
when you consider HBase is a distributed system and you want a point in time.
Main JIRAs:
• HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling
the table
• HBASE-7290 - Online Snapshots: Take a snapshot of a live, running
table by splitting the memstore.
• HBASE-7360 - Backport Snapshots to 0.94
Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Theme: corner cases!
Replication
The gist: use asynchronous WAL shipping to replay all edits on a different
(possibly remote) cluster, for Disaster Recovery or other operational purposes.
Main JIRAs:
• HBASE-1295 - Multi-data-center replication: Top level issue. Real meat
was actually implemented in 0.90 (Jan 2010), so not a new feature.
• HBASE-8207- Data loss when machine name contains "-". Doh.
• HBASE-2611 - Handle RS failure while processing failure of another:
This was an ugly issue that took a while to fix. Corner cases matter!
Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
Plug: stick around next while Chris Trezzo tweets about Replication!!
Theme: corner cases! Corner Case!
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Corner Case!
Compaction
The gist: In an LSM store, if you don't compact the store files, you end up with
lots of 'em, which makes reads slower. Not a new feature, just improvements.
Main JIRAs:
• HBASE-7516 - Make compaction policy pluggable: allow users to
customize which files are included for compaction.
• HBASE-2231 - Compaction events should be written to HLog: deal with
the case when regions have been reassigned since compaction started.
Look for cool stuff to come in the next year with tiered (aka "leveled")
compaction policies, so you could do stuff like (e.g.) put "recent" data into
smaller files that'll be hit frequently, and the older "long tail" data into bigger
files that'll be hit less frequently.
Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
Corner Case!
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization
Top 10 "big topics":
Snapshots:
Replication:
Compaction:
Metrics: move to metrics2.
Assignment: it's tricky, yo.
Hadoop 2: support it for HA NN.
Protobufs: wire compatibility!
Security: kerberos, in the core.
Bulk Loading: pop in an HFile.
Modularization: break up the code.
Now on to the
"Indie Hits JIRAs".
What's left? About half.
Blocker:
Critical:
Major:
Minor:
31
88
455
206
830
Trivial: 52
1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
Blocker:
Critical:
Major:
Minor:
31
88
455
206
573
Trivial: 52
Let's cut out these:
830
What's left? About half.1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
We can't cover 573 issues.
Let's just hit a few cool ones.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-5416
Interesting because:most commented JIRA (200+ human comments!)
Improve perf of scans with some kinds of filters
What? Avoid loading non-essential CFs until after filters run, big perf gain.
How?
+++ Filter.java:
+ abstract public boolean isFamilyEssential(byte[] name);
+++ HRegion.java:
KeyValueScanner scanner = store.getScanner(scan, entry.getValue());
- scanners.add(scanner);
+ if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
+ || this.filter.isFamilyEssential(entry.getKey())) {
+ scanners.add(scanner);
+ } else {
+ joinedScanners.add(scanner);
+ }
By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl
200 comments? Srsly?
From whom?
To save you some time, allow
me to summarize.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
• Nick: Not enough utest coverage to put this in core.
• Max: Agree, but I can't find any other way to do this.
Reenactment ...
Feb 2012:
• Max Lapan: Hey guys, here's a cool patch!
• Nicolas S: This should be an app detail, not in core.
• Ted Yu: I fixed your typos while you were asleep!
• Nick: Not enough utest coverage to put this in core.
• Max: Agree, but I can't find any other way to do this.
• Kannan: Why don't you try 2-phase w/ multiget?
• Max: OK, ok, I'll try it.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
July 2012:
• Max: Anybody there? Here's a perf test.
• Ted: Cool!
Reenactment ...
May 2012:
• Max: Ran in prod w/ 160-node 300TB cluster. Runs like
a champ, 20x the 2-phase approach. Boom.
• Ted: Holy guacamole that's a big patch.
July 2012:
• Max: Anybody there? Here's a perf test.
• Ted: Cool!
Oct 2012:
• Anoop: A coprocessor would make faster.
• Max: We're on 0.90 and can't use CP.
• Stack: -1, FB guys are right about needing more tests.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
• Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
Reenactment ...
Dec 2012:
• Sergey: I'm on it guys. Rebased on trunk, added the
ability to configure, and integration tests.
• Stack: Still not enough tests. Some new code even
when disabled? Who's reviewing? Go easy lads.
• Ram: I'm on it. Couple improvements, but looks good.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
• Ted: I optimized the disabled path.
• Lars: Sweet.
Reenactment ...
Dec 31st, 2012 (while everyone else is partying):
• Lars: Ooh, let's pull this into 0.94! I made a patch.
• Lars: ... hold the phone! This slows down a tight loop
case (even when disabled) by 10-20%.
• Ted: I optimized the disabled path.
• Lars: Sweet.
Reenactment ...
Jan, 2013:
• Ram: +1, let's commit.
• Ted: Committed to trunk
• Lars: Committed to 0.94.
Reenactment ...
Jan, 2013:
• Ram: +1, let's commit.
• Ted: Committed to trunk
• Lars: Committed to 0.94.
And there was much rejoi....
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!! Also,
Dave is running world's biggest HBase cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
• All: Uhh ... change it? Keep it? Change it?
Reenactment ...
Feb, 2013:
• Dave Latham: Stop the presses! This breaks rolling
upgrade for me b/c I directly implement Filter.
• All: Crapface.
• Stack: We should back this out. SOMA pride!!
Also, Dave is running world's biggest HBase
cluster, FYI.
• Lars: Filter is internal. Extend FilterBase maybe?
• Ted: If we take it OUT now, it's also a regression.
• Dave: Chill dudes, we can fix by changing our client.
• All: Uhh ... change it? Keep it? Change it?
Resolution: Change it (HBASE-7920)
Moral of the story?
• JIRA comments are a great way to learn.
• Do the work to keep new features from
destabilizing core code paths.
• Careful with changing interfaces.
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4676
Interesting because:most watched (42 watchers), and biggest patch.
Prefix Compression - Trie data block encoding
What? An optimization to compress what we store for key/value prefixes.
How? ~8000 new lines added! (Originally written in git repo, here)
At SFDC, James Taylor reported seeing 5-15x improvement in
Phoenix, with no degradation in scan performance. Woot!
By: Matt Corgan
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7403
Interesting because: It's a cool feature. And went through 33 revisions!
Online Merge
What? The ability to merge regions online and transactionally, just like we
do with splitting regions.
How? The master moves the regions together (on the same regionserver)
and send MERGE RPC to regionserver. Merge happens in a transaction.
Example:
RegionMergeTransaction mt = new
RegionMergeTransaction(conf, parent, midKey)
if (!mt.prepare(services)) return;
try {
mt.execute(server, services);
} catch (IOException ioe) {
try {
mt.rollback(server, services);
return;
} catch (RuntimeException e) {
myAbortable.abort("Failed merge, abort");
}
}
By: Chunhui Shen
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-1212
Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year.
Merge tool expects regions to have diff seq ids
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
- if (srcFiles.size() == 2) {
- long seqA = srcFiles.get(0).getMaxSequenceId();
- long seqB = srcFiles.get(1).getMaxSequenceId();
- if (seqA == seqB) {
- // Can't have same sequenceid since on open store, this is what
- // distingushes the files (see the map of stores how its keyed
by
- // sequenceid).
- throw new IOException("Files have same sequenceid: " + seqA);
- }
- }
By: Jean-Marc Spaggiari
HBASE-1212
Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year.
Merge tool expects regions to have diff seq ids
What? With aggregated hfile format, sequence id is written into file, not
along side. In rare case where two store files have same sequence id and
we want to merge the regions, it wouldn't work.
How? In conjucntion with HBASE-7287, removes the code that did this:
--- HRegion.java
List<StoreFile> srcFiles = es.getValue();
- if (srcFiles.size() == 2) {
- long seqA = srcFiles.get(0).getMaxSequenceId();
- long seqB = srcFiles.get(1).getMaxSequenceId();
- if (seqA == seqB) {
- // Can't have same sequenceid since on open store, this is what
- // distingushes the files (see the map of stores how its keyed
by
- // sequenceid).
- throw new IOException("Files have same sequenceid: " + seqA);
- }
- }
By: Jean-Marc Spaggiari
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-7801
Interesting because: has durability implications worth blogging about.
Allow a deferred sync option per Mutation
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+ setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+ this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+ switch(durability) { ...
+ case SKIP_WAL: // nothing to do
+ break;
+ case ASYNC_WAL: // defer the sync, unless we globally can't
+ if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+ break;
+ case SYNC_WAL:
+ case FSYNC_WAL:
+ // sync the WAL edit (SYNC and FSYNC treated the same for now)
+ this.log.sync(txid);
+ break;
+ }
By: Lars Hofhansl
HBASE-7801
Interesting because: has durability implications worth blogging about.
Allow a deferred sync option per Mutation
What? Previously, you could only turn WAL writing off completely, per table
or edit. Now you can choose "none", "async", "sync" or "fsync".
How?
+++ Mutation.java
+ public void setDurability(Durability d) {
+ setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal()));
+ this.writeToWAL = d != Durability.SKIP_WAL;
+ }
+++ HRegion.java
+ private void syncOrDefer(long txid, Durability durability) {
+ switch(durability) { ...
+ case SKIP_WAL: // nothing to do
+ break;
+ case ASYNC_WAL: // defer the sync, unless we globally can't
+ if (this.deferredLogSyncDisabled) { this.log.sync(txid); }
+ break;
+ case SYNC_WAL:
+ case FSYNC_WAL:
+ // sync the WAL edit (SYNC and FSYNC treated the same for now)
+ this.log.sync(txid);
+ break;
+ }
By: Lars Hofhansl
Wha ... ?
Oh. See HADOOP-6313
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-4072
Interesting because: Biggest facepalm.
Disable reading zoo.cfg files
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+ if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
+ LOG.warn(
+ "Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
+ "configuration under the hbase-site.xml");
By: Harsh J
HBASE-4072
Interesting because: Biggest facepalm.
Disable reading zoo.cfg files
What? Used to be, if two system both use ZK and one needed to override
values, the zoo.cfg values would always win. Caused a lot of goofy bugs in
hbase utils like import/export, integration with other systems like flume.
How? Put reading it behind a config that defaults to false.
+ if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) {
+ LOG.warn(
+ "Parsing zoo.cfg is deprecated. Place all ZK related HBase " +
+ "configuration under the hbase-site.xml");
By: Harsh J
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-3171
Interesting because: Only HBase JIRA with a downfall parody.
Drop ROOT, store META location in ZooKeeper
What? The ROOT just tells you where the META table is. That's silly.
How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-))
By: J-D Cryans
http://www.youtube.com/watch?v=tuM9MYDssvg
HBASE-5416
HBASE-4676
HBASE-7403
HBASE-1212
HBASE-7801
HBASE-4072
HBASE-3171
HBASE-6868
HBASE-6868
Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1.
Avoid double checksumming blocks
What? since HBASE-5074 (checksums), sometimes we double checksum.
How? 3 line patch to default to skip checksum if not local fs.
+++ HFileSystem.java // Incorrect data is read and HFileBlocks won't be
able to read
// their header magic numbers. See HBASE-5885
if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) {
+ conf = new Configuration(conf);
+ conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum", true);
this.noChecksumFs = newInstanceFileSystem(conf);...
+++ HRegionServer.java // If hbase checksum verification enabled,
automatically
//switch off hdfs checksum verification.
this.useHBaseChecksum = conf.getBoolean(
- HConstants.HBASE_CHECKSUM_VERIFICATION, true);
+ HConstants.HBASE_CHECKSUM_VERIFICATION, false);
By: Lars Hofhansl
What's it all mean?
Active codebase. Good!
Complexity increasing. Bad!
credit: https://www.ohloh.net/p/hbase
HBaseCon 2013: 1500 JIRAs in 20 Minutes
One more interesting stat:
"Good on you"s
One more interesting stat:
stack
"Good on you"s
everyone
else
Takeaways?
Busy community.
New features!
Fixing corner cases.
BTW: How did I do this?
JIRA API +
Phoenix on HBase +
http://github.com/ivarley/jirachi
Thanks!
@thefutureian
1 of 112

Recommended

Taming the Elephant: Efficient and Effective Apache Hadoop Management by
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
1.2K views33 slides
HBaseCon 2013: Apache HBase Table Snapshots by
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
11.8K views61 slides
Data Evolution in HBase by
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBaseHBaseCon
5K views38 slides
HBase Backups by
HBase BackupsHBase Backups
HBase BackupsHBaseCon
6.7K views48 slides
Hadoop operations-2015-hadoop-summit-san-jose-v5 by
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
3.5K views38 slides
Keep your hadoop cluster at its best! v4 by
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
792 views39 slides

More Related Content

What's hot

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013) by
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
10.9K views14 slides
HBase Read High Availability Using Timeline-Consistent Region Replicas by
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
4.1K views38 slides
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data by
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
3.5K views17 slides
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase by
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
3.2K views21 slides
LLAP: Sub-Second Analytical Queries in Hive by
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveDataWorks Summit/Hadoop Summit
6.9K views46 slides
Tales from the Cloudera Field by
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
4K views38 slides

What's hot(20)

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013) by Suman Srinivasan
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Suman Srinivasan10.9K views
HBase Read High Availability Using Timeline-Consistent Region Replicas by HBaseCon
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon4.1K views
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data by Cloudera, Inc.
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.3.5K views
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase by Cloudera, Inc.
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.3.2K views
Tales from the Cloudera Field by HBaseCon
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon4K views
Hadoop & cloud storage object store integration in production (final) by Chris Nauroth
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth2.7K views
HBase Tales From the Trenches - Short stories about most common HBase operati... by DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit1.8K views
High Availability for HBase Tables - Past, Present, and Future by DataWorks Summit
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit2.7K views
HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
HBaseCon 2015 General Session: State of HBase by HBaseCon
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
HBaseCon4.5K views
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,... by Cloudera, Inc.
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.3.8K views
HBase and HDFS: Understanding FileSystem Usage in HBase by enissoz
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz74K views

Viewers also liked

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN by
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
2.9K views37 slides
HBaseCon 2012 | Scaling GIS In Three Acts by
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.
3.6K views14 slides
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo! by
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!Cloudera, Inc.
3.2K views24 slides
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase by
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
3.3K views20 slides
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC by
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
3.9K views12 slides
HBaseCon 2013: Being Smarter Than the Smart Meter by
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
4.3K views20 slides

Viewers also liked(20)

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN by HBaseCon
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon2.9K views
HBaseCon 2012 | Scaling GIS In Three Acts by Cloudera, Inc.
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
Cloudera, Inc.3.6K views
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo! by Cloudera, Inc.
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
Cloudera, Inc.3.2K views
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase by HBaseCon
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon3.3K views
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC by Cloudera, Inc.
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Cloudera, Inc.3.9K views
HBaseCon 2013: Being Smarter Than the Smart Meter by Cloudera, Inc.
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
Cloudera, Inc.4.3K views
HBaseCon 2013: Apache HBase on Flash by Cloudera, Inc.
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.4.3K views
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data... by Cloudera, Inc.
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.3.5K views
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second... by Cloudera, Inc.
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.4.2K views
Cross-Site BigTable using HBase by HBaseCon
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
HBaseCon3.5K views
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase. by Cloudera, Inc.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.7.1K views
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon by Cloudera, Inc.
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.3.4K views
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics by Cloudera, Inc.
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.4.8K views
HBaseCon 2013: Rebuilding for Scale on Apache HBase by Cloudera, Inc.
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Cloudera, Inc.3.9K views
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb... by Cloudera, Inc.
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.3.2K views
HBaseCon 2012 | Building Mobile Infrastructure with HBase by Cloudera, Inc.
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.2.6K views
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera by Cloudera, Inc.
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Cloudera, Inc.8.7K views
HBaseCon 2013: ETL for Apache HBase by Cloudera, Inc.
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
Cloudera, Inc.6.9K views
HBase: Extreme Makeover by HBaseCon
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
HBaseCon3.3K views
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory... by Cloudera, Inc.
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.4.1K views

Similar to HBaseCon 2013: 1500 JIRAs in 20 Minutes

1500 JIRAs in 20 minutes - HBaseCon 2013 by
1500 JIRAs in 20 minutes - HBaseCon 20131500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 2013Ian Varley
614 views111 slides
Ensuring Quality in Data Lakes (D&D Meetup Feb 22) by
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)lakeFS
108 views35 slides
Hadoop demo ppt by
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
40.6K views46 slides
Data Virtualization: Revolutionizing data cloning by
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning Kyle Hailey
1.2K views99 slides
NameNode Analytics - Querying HDFS Namespace in Real Time by
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
138 views34 slides
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ... by
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Philip Stehlik
1.7K views34 slides

Similar to HBaseCon 2013: 1500 JIRAs in 20 Minutes(20)

1500 JIRAs in 20 minutes - HBaseCon 2013 by Ian Varley
1500 JIRAs in 20 minutes - HBaseCon 20131500 JIRAs in 20 minutes - HBaseCon 2013
1500 JIRAs in 20 minutes - HBaseCon 2013
Ian Varley614 views
Ensuring Quality in Data Lakes (D&D Meetup Feb 22) by lakeFS
Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)Ensuring Quality in Data Lakes  (D&D Meetup Feb 22)
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
lakeFS108 views
Hadoop demo ppt by Phil Young
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young40.6K views
Data Virtualization: Revolutionizing data cloning by Kyle Hailey
Data Virtualization: Revolutionizing data cloningData Virtualization: Revolutionizing data cloning
Data Virtualization: Revolutionizing data cloning
Kyle Hailey1.2K views
NameNode Analytics - Querying HDFS Namespace in Real Time by Plamen Jeliazkov
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
Plamen Jeliazkov138 views
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ... by Philip Stehlik
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Agile Database Modeling with Grails - Preview of GORM 1.4 - SF Grails Meetup ...
Philip Stehlik1.7K views
Introduction to Big Data & Hadoop by Edureka!
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!2.4K views
Introduction to Bigdata and HADOOP by vinoth kumar
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
vinoth kumar1K views
Keynote: The Future of Apache HBase by HBaseCon
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
HBaseCon2.9K views
Hive @ Hadoop day seattle_2010 by nzhang
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang3K views
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse by DataWorks Summit
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit506 views
BGOUG "Agile Data: revolutionizing database cloning' by Kyle Hailey
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
Kyle Hailey1.9K views
Mutable Data in Hive's Immutable World by DataWorks Summit
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
DataWorks Summit1.9K views
Mutable Data in Hive's Immutable World by Lester Martin
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
Lester Martin3.3K views

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx by
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
109 views55 slides
Cloudera Data Impact Awards 2021 - Finalists by
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
6.5K views34 slides
2020 Cloudera Data Impact Awards Finalists by
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
6.3K views43 slides
Edc event vienna presentation 1 oct 2019 by
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
4.5K views67 slides
Machine Learning with Limited Labeled Data 4/3/19 by
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
3.6K views36 slides
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
2.5K views21 slides

More from Cloudera, Inc.(20)

Partner Briefing_January 25 (FINAL).pptx by Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.109 views
Cloudera Data Impact Awards 2021 - Finalists by Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.6.5K views
2020 Cloudera Data Impact Awards Finalists by Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.6.3K views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views
Machine Learning with Limited Labeled Data 4/3/19 by Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.3.6K views
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.2.5K views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.4.9K views
Introducing Cloudera Data Science Workbench for HDP 2.12.19 by Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.2.7K views
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19 by Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.1.6K views
Leveraging the cloud for analytics and machine learning 1.29.19 by Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.1.6K views
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19 by Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.2.5K views
Leveraging the Cloud for Big Data Analytics 12.11.18 by Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.1.7K views
Modern Data Warehouse Fundamentals Part 3 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.1.3K views
Modern Data Warehouse Fundamentals Part 2 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.2.3K views
Modern Data Warehouse Fundamentals Part 1 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.1.5K views
Extending Cloudera SDX beyond the Platform by Cloudera, Inc.
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.966 views
Federated Learning: ML with Privacy on the Edge 11.15.18 by Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.2.2K views
Analyst Webinar: Doing a 180 on Customer 360 by Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.1.4K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.1K views
Introducing the data science sandbox as a service 8.30.18 by Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.1.2K views

Recently uploaded

NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
474 views30 slides
Measurecamp Brussels - Synthetic data.pdf by
Measurecamp Brussels - Synthetic data.pdfMeasurecamp Brussels - Synthetic data.pdf
Measurecamp Brussels - Synthetic data.pdfHuman37
27 views14 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin by
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
37 views38 slides
AIM102-S_Cognizant_CognizantCognitive by
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitivePhilipBasford
23 views36 slides
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfThomasBronack
31 views31 slides
Discover Aura Workshop (12.5.23).pdf by
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdfNeo4j
20 views55 slides

Recently uploaded(20)

NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu474 views
Measurecamp Brussels - Synthetic data.pdf by Human37
Measurecamp Brussels - Synthetic data.pdfMeasurecamp Brussels - Synthetic data.pdf
Measurecamp Brussels - Synthetic data.pdf
Human37 27 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays37 views
AIM102-S_Cognizant_CognizantCognitive by PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford23 views
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by ThomasBronack
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
ThomasBronack31 views
Discover Aura Workshop (12.5.23).pdf by Neo4j
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j20 views
Cocktail of Environments. How to Mix Test and Development Environments and St... by Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada43 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li104 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue120 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage65 views
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」 by PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software198 views
What is Authentication Active Directory_.pptx by HeenaMehta35
What is Authentication Active Directory_.pptxWhat is Authentication Active Directory_.pptx
What is Authentication Active Directory_.pptx
HeenaMehta3515 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE85 views

HBaseCon 2013: 1500 JIRAs in 20 Minutes

  • 1. 1500 JIRAs in 20 Minutes The Evolution of HBase, 2012-2013 Ian Varley, Salesforce.com @thefutureian
  • 2. It's been a year since the first HBaseCon. What's changed?
  • 3. It's been a year since the first HBaseCon. What's changed? (besides my beard length)
  • 4. One lens on the evolution of HBase is through JIRA (issue tracking system).
  • 5. HBase has a lot of activity.
  • 6. HBase has a lot of activity. Total JIRAs, all time: ~8700
  • 7. HBase has a lot of activity. Opened in last year: ~2500 Total JIRAs, all time: ~8700
  • 9. HBase has a lot of activity. Opened in last year: ~2500 Fixed in last year: 1638 Total JIRAs, all time: ~8700
  • 10. HBase has a lot of activity. Opened in last year: ~2500 Fixed in last year: 1638 Total JIRAs, all time: ~8700 resolved >= 2012-05-23 AND resolved <= 2013-05-24 AND resolution in (Fixed, Implemented)
  • 11. So we're going to talk about them all. One by one.
  • 13. We need to narrow it down.
  • 14. First, let's get rid of the non- functional changes:
  • 15. First, let's get rid of the non- functional changes: Test: 307
  • 16. First, let's get rid of the non- functional changes: Test: Build: 307 55
  • 17. First, let's get rid of the non- functional changes: Test: Build: Doc: 307 55 107
  • 18. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62
  • 19. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62 503(some overlap) Total:
  • 20. First, let's get rid of the non- functional changes: Test: Build: Doc: Ports: 307 55 107 62 503(some overlap) "test", "junit", etc. "pom", "classpath", "mvn", "build", etc. "book", "[site]", "[refGuide]", "javadoc", etc. "backport", "forward port", etc. Total:
  • 21. That leaves 1135 functional changes to go over. (In 18 minutes.)
  • 22. Break what's left into 2 parts: • Big Topics (20+ JIRAs on same issue) • Indie Hits (Cool for some other reason)
  • 23. Top 10 "big topics":
  • 24. Top 10 "big topics":
  • 25. Snapshots: Top 10 "big topics": 82
  • 30. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: 82 58 54 53 44 37
  • 31. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: 82 58 54 53 44 37 34
  • 32. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: 82 58 54 53 44 37 34 28
  • 33. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: 82 58 54 53 44 37 34 28 23
  • 34. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21
  • 35. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21 416(some overlap) (305 functional, 111 non-functional)
  • 36. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: Assignment: Hadoop 2: Protobufs: Security: Bulk Loading: Modularization: 82 58 54 53 44 37 34 28 23 21 416(some overlap) (305 functional, 111 non-functional) Let's dive in to the top 3.
  • 37. Snapshots The gist: Take advantage of the fact that files in HDFS are already immutable to get fast "snapshots" of tables that you can roll back to. This is pretty tricky when you consider HBase is a distributed system and you want a point in time. Main JIRAs: • HBASE-6055 - Offline Snapshots: Take a snapshot after first disabling the table • HBASE-7290 - Online Snapshots: Take a snapshot of a live, running table by splitting the memstore. • HBASE-7360 - Backport Snapshots to 0.94 Top contributors: Matteo B, Jonathan H, Ted Y, Jesse Y, Enis S
  • 38. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H
  • 39. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Theme: corner cases!
  • 40. Replication The gist: use asynchronous WAL shipping to replay all edits on a different (possibly remote) cluster, for Disaster Recovery or other operational purposes. Main JIRAs: • HBASE-1295 - Multi-data-center replication: Top level issue. Real meat was actually implemented in 0.90 (Jan 2010), so not a new feature. • HBASE-8207- Data loss when machine name contains "-". Doh. • HBASE-2611 - Handle RS failure while processing failure of another: This was an ugly issue that took a while to fix. Corner cases matter! Top contributors: J-D Cryans, Himanshu V, Chris T, Devaraj D, Lars H Plug: stick around next while Chris Trezzo tweets about Replication!! Theme: corner cases! Corner Case!
  • 41. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y
  • 42. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Corner Case!
  • 43. Compaction The gist: In an LSM store, if you don't compact the store files, you end up with lots of 'em, which makes reads slower. Not a new feature, just improvements. Main JIRAs: • HBASE-7516 - Make compaction policy pluggable: allow users to customize which files are included for compaction. • HBASE-2231 - Compaction events should be written to HLog: deal with the case when regions have been reassigned since compaction started. Look for cool stuff to come in the next year with tiered (aka "leveled") compaction policies, so you could do stuff like (e.g.) put "recent" data into smaller files that'll be hit frequently, and the older "long tail" data into bigger files that'll be hit less frequently. Top contributors: Sergey S, Elliott C, Jimmy X, stack, Matteo B, Jesse Y Corner Case!
  • 44. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 45. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 46. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2 Protobufs Security Bulk Loading Modularization
  • 47. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs Security Bulk Loading Modularization
  • 48. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security Bulk Loading Modularization
  • 49. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading Modularization
  • 50. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization
  • 51. Top 10 "big topics": Snapshots: Replication: Compaction: Metrics: move to metrics2. Assignment: it's tricky, yo. Hadoop 2: support it for HA NN. Protobufs: wire compatibility! Security: kerberos, in the core. Bulk Loading: pop in an HFile. Modularization: break up the code.
  • 52. Now on to the "Indie Hits JIRAs".
  • 53. What's left? About half. Blocker: Critical: Major: Minor: 31 88 455 206 830 Trivial: 52 1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
  • 54. Blocker: Critical: Major: Minor: 31 88 455 206 573 Trivial: 52 Let's cut out these: 830 What's left? About half.1638 total - (503 Non-Functional + 305 Categorized Functional) = 830 Remaining
  • 55. We can't cover 573 issues. Let's just hit a few cool ones.
  • 58. HBASE-5416 Interesting because:most commented JIRA (200+ human comments!) Improve perf of scans with some kinds of filters What? Avoid loading non-essential CFs until after filters run, big perf gain. How? +++ Filter.java: + abstract public boolean isFamilyEssential(byte[] name); +++ HRegion.java: KeyValueScanner scanner = store.getScanner(scan, entry.getValue()); - scanners.add(scanner); + if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand() + || this.filter.isFamilyEssential(entry.getKey())) { + scanners.add(scanner); + } else { + joinedScanners.add(scanner); + } By: Max Lapan for original idea & patch, Sergey Shelukhin for final impl
  • 61. To save you some time, allow me to summarize.
  • 62. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch!
  • 63. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core.
  • 64. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep!
  • 65. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep! • Nick: Not enough utest coverage to put this in core. • Max: Agree, but I can't find any other way to do this.
  • 66. Reenactment ... Feb 2012: • Max Lapan: Hey guys, here's a cool patch! • Nicolas S: This should be an app detail, not in core. • Ted Yu: I fixed your typos while you were asleep! • Nick: Not enough utest coverage to put this in core. • Max: Agree, but I can't find any other way to do this. • Kannan: Why don't you try 2-phase w/ multiget? • Max: OK, ok, I'll try it.
  • 67. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 68. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom.
  • 69. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch.
  • 70. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch. July 2012: • Max: Anybody there? Here's a perf test. • Ted: Cool!
  • 71. Reenactment ... May 2012: • Max: Ran in prod w/ 160-node 300TB cluster. Runs like a champ, 20x the 2-phase approach. Boom. • Ted: Holy guacamole that's a big patch. July 2012: • Max: Anybody there? Here's a perf test. • Ted: Cool! Oct 2012: • Anoop: A coprocessor would make faster. • Max: We're on 0.90 and can't use CP. • Stack: -1, FB guys are right about needing more tests.
  • 72. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests.
  • 73. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. • Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads.
  • 74. Reenactment ... Dec 2012: • Sergey: I'm on it guys. Rebased on trunk, added the ability to configure, and integration tests. • Stack: Still not enough tests. Some new code even when disabled? Who's reviewing? Go easy lads. • Ram: I'm on it. Couple improvements, but looks good.
  • 75. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch.
  • 76. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%.
  • 77. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. • Ted: I optimized the disabled path. • Lars: Sweet.
  • 78. Reenactment ... Dec 31st, 2012 (while everyone else is partying): • Lars: Ooh, let's pull this into 0.94! I made a patch. • Lars: ... hold the phone! This slows down a tight loop case (even when disabled) by 10-20%. • Ted: I optimized the disabled path. • Lars: Sweet.
  • 79. Reenactment ... Jan, 2013: • Ram: +1, let's commit. • Ted: Committed to trunk • Lars: Committed to 0.94.
  • 80. Reenactment ... Jan, 2013: • Ram: +1, let's commit. • Ted: Committed to trunk • Lars: Committed to 0.94. And there was much rejoi....
  • 81. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter.
  • 82. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface.
  • 83. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI.
  • 84. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression.
  • 85. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client.
  • 86. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client. • All: Uhh ... change it? Keep it? Change it?
  • 87. Reenactment ... Feb, 2013: • Dave Latham: Stop the presses! This breaks rolling upgrade for me b/c I directly implement Filter. • All: Crapface. • Stack: We should back this out. SOMA pride!! Also, Dave is running world's biggest HBase cluster, FYI. • Lars: Filter is internal. Extend FilterBase maybe? • Ted: If we take it OUT now, it's also a regression. • Dave: Chill dudes, we can fix by changing our client. • All: Uhh ... change it? Keep it? Change it? Resolution: Change it (HBASE-7920)
  • 88. Moral of the story? • JIRA comments are a great way to learn. • Do the work to keep new features from destabilizing core code paths. • Careful with changing interfaces.
  • 90. HBASE-4676 Interesting because:most watched (42 watchers), and biggest patch. Prefix Compression - Trie data block encoding What? An optimization to compress what we store for key/value prefixes. How? ~8000 new lines added! (Originally written in git repo, here) At SFDC, James Taylor reported seeing 5-15x improvement in Phoenix, with no degradation in scan performance. Woot! By: Matt Corgan
  • 92. HBASE-7403 Interesting because: It's a cool feature. And went through 33 revisions! Online Merge What? The ability to merge regions online and transactionally, just like we do with splitting regions. How? The master moves the regions together (on the same regionserver) and send MERGE RPC to regionserver. Merge happens in a transaction. Example: RegionMergeTransaction mt = new RegionMergeTransaction(conf, parent, midKey) if (!mt.prepare(services)) return; try { mt.execute(server, services); } catch (IOException ioe) { try { mt.rollback(server, services); return; } catch (RuntimeException e) { myAbortable.abort("Failed merge, abort"); } } By: Chunhui Shen
  • 94. HBASE-1212 Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year. Merge tool expects regions to have diff seq ids What? With aggregated hfile format, sequence id is written into file, not along side. In rare case where two store files have same sequence id and we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); - if (srcFiles.size() == 2) { - long seqA = srcFiles.get(0).getMaxSequenceId(); - long seqB = srcFiles.get(1).getMaxSequenceId(); - if (seqA == seqB) { - // Can't have same sequenceid since on open store, this is what - // distingushes the files (see the map of stores how its keyed by - // sequenceid). - throw new IOException("Files have same sequenceid: " + seqA); - } - } By: Jean-Marc Spaggiari
  • 95. HBASE-1212 Interesting because:Oldest issue (Feb, 2009) resolved w/ patch this year. Merge tool expects regions to have diff seq ids What? With aggregated hfile format, sequence id is written into file, not along side. In rare case where two store files have same sequence id and we want to merge the regions, it wouldn't work. How? In conjucntion with HBASE-7287, removes the code that did this: --- HRegion.java List<StoreFile> srcFiles = es.getValue(); - if (srcFiles.size() == 2) { - long seqA = srcFiles.get(0).getMaxSequenceId(); - long seqB = srcFiles.get(1).getMaxSequenceId(); - if (seqA == seqB) { - // Can't have same sequenceid since on open store, this is what - // distingushes the files (see the map of stores how its keyed by - // sequenceid). - throw new IOException("Files have same sequenceid: " + seqA); - } - } By: Jean-Marc Spaggiari
  • 97. HBASE-7801 Interesting because: has durability implications worth blogging about. Allow a deferred sync option per Mutation What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } + break; + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + } By: Lars Hofhansl
  • 98. HBASE-7801 Interesting because: has durability implications worth blogging about. Allow a deferred sync option per Mutation What? Previously, you could only turn WAL writing off completely, per table or edit. Now you can choose "none", "async", "sync" or "fsync". How? +++ Mutation.java + public void setDurability(Durability d) { + setAttribute(DURABILITY_ID_ATTR, Bytes.toBytes(d.ordinal())); + this.writeToWAL = d != Durability.SKIP_WAL; + } +++ HRegion.java + private void syncOrDefer(long txid, Durability durability) { + switch(durability) { ... + case SKIP_WAL: // nothing to do + break; + case ASYNC_WAL: // defer the sync, unless we globally can't + if (this.deferredLogSyncDisabled) { this.log.sync(txid); } + break; + case SYNC_WAL: + case FSYNC_WAL: + // sync the WAL edit (SYNC and FSYNC treated the same for now) + this.log.sync(txid); + break; + } By: Lars Hofhansl Wha ... ? Oh. See HADOOP-6313
  • 100. HBASE-4072 Interesting because: Biggest facepalm. Disable reading zoo.cfg files What? Used to be, if two system both use ZK and one needed to override values, the zoo.cfg values would always win. Caused a lot of goofy bugs in hbase utils like import/export, integration with other systems like flume. How? Put reading it behind a config that defaults to false. + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { + LOG.warn( + "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + + "configuration under the hbase-site.xml"); By: Harsh J
  • 101. HBASE-4072 Interesting because: Biggest facepalm. Disable reading zoo.cfg files What? Used to be, if two system both use ZK and one needed to override values, the zoo.cfg values would always win. Caused a lot of goofy bugs in hbase utils like import/export, integration with other systems like flume. How? Put reading it behind a config that defaults to false. + if (conf.getBoolean(HBASE_CONFIG_READ_ZOOKEEPER_CONFIG, false)) { + LOG.warn( + "Parsing zoo.cfg is deprecated. Place all ZK related HBase " + + "configuration under the hbase-site.xml"); By: Harsh J
  • 103. HBASE-3171 Interesting because: Only HBase JIRA with a downfall parody. Drop ROOT, store META location in ZooKeeper What? The ROOT just tells you where the META table is. That's silly. How? Pretty big patch (59 files changed, 580 insertions(+), 1749 deletions(-)) By: J-D Cryans http://www.youtube.com/watch?v=tuM9MYDssvg
  • 105. HBASE-6868 Interesting because: tiny fix, but marked as a blocker, and sunk 0.94.2 RC1. Avoid double checksumming blocks What? since HBASE-5074 (checksums), sometimes we double checksum. How? 3 line patch to default to skip checksum if not local fs. +++ HFileSystem.java // Incorrect data is read and HFileBlocks won't be able to read // their header magic numbers. See HBASE-5885 if (useHBaseChecksum && !(fs instanceof LocalFileSystem)) { + conf = new Configuration(conf); + conf.setBoolean("dfs.client.read.shortcircuit.skip.checksum", true); this.noChecksumFs = newInstanceFileSystem(conf);... +++ HRegionServer.java // If hbase checksum verification enabled, automatically //switch off hdfs checksum verification. this.useHBaseChecksum = conf.getBoolean( - HConstants.HBASE_CHECKSUM_VERIFICATION, true); + HConstants.HBASE_CHECKSUM_VERIFICATION, false); By: Lars Hofhansl
  • 106. What's it all mean? Active codebase. Good! Complexity increasing. Bad! credit: https://www.ohloh.net/p/hbase
  • 108. One more interesting stat: "Good on you"s
  • 109. One more interesting stat: stack "Good on you"s everyone else
  • 111. BTW: How did I do this? JIRA API + Phoenix on HBase + http://github.com/ivarley/jirachi