Hosted by
Welcome
Michael Stack, Software Engineer, Cloudera & HBase PMC
Chair
Goals of HBaseCon 2013
Bring the Apache HBase community together
Encourage contributions to the HBase ecosystem
Share challenges and solutions for HBase
1
2
3
HBaseCon 2013 Session Tracks
Operations Internals Ecosystem
Case
Studies
Session Track 3Track 1 Track 2 Track 3 Track 4
HBaseCon 2013 Program Committee
Gary Helmling
Lars Hofhansl
Jonathan Hsieh
Doug Meil
Andrew Purtell
Enis Söztutar
Michael Stack – Chair
Liyin Tang
Architect
Engineer
Software Engineer
Chief Software Architect
Systems Architect
Member of Technical Staff
Software Engineer
Software Engineer
Thank You to Our Sponsors
Community Sponsor
Conference Sponsors
Media Sponsors
Visit Sponsors = Chance to Win
=
Conference Notes
• Please fill out the overall
conference survey
• Reception is 5:40pm – 8:00pm
in the Yerba Buena Foyer
• Connecting to the internet
• Wireless network = Marriott Conference
• Passcode = db075b
Hosted by
The Apache HBase Community:
Best Ever and Getting Better
Amr Awadallah, CTO and Co-founder, Cloudera
@awadallah
The Apache HBase Community Has
Never Been Healthier
JIRA ActivityCommits Activity
The Market for HBase Skills is
Bigger than Ever
The HBase Ecosystem is Rich and Expanding
HBaseCon 2013 speakers from these companies this year
(logos below the dotted line are net-new from 2012!)
Top 5 Reasons Cloudera Loves HBase
Its vibrant community is a benchmark for the entire
Apache Hadoop ecosystem.
It’s a first-class citizen inside the Hadoop stack.
It allows us to offer support services for which a lot of customers
will pay good money.
It draws top-drawer engineer talent to Cloudera.
It gives us an excuse to host this tremendous conference and
throw a big party for the community!
1
2
3
4
5
Hosted by
Thank You for Attending HBaseCon
and Thank You for Contributing!
Hosted by
State of the Apache HBase Union
Michael Stack, Software Engineer, Cloudera & HBase PMC
Chair and Lars Hofhansl, Architect, Salesforce.com
We are your Release Managers!
• Mr. (0.94.x) Lars Hofhansl
• Michael Stack (0.95.x/0.96.x)
Introducing...
Your PMC...
Your
Committers...
MVP
…and the award goes to…
Diverse Team*
*http://hbase.apache.org/team-list.html
Deploys
• Multitenant multifarious feature store
• a.k.a dumping ground
• Stumbleupon, Y!, Salesforce
• Reconciliation store
• eBay
• Timeseries
• OpenTSDB, Salesforce, FB ODS
• Lots-o-entities store
• Flurry, Kiji, Genome
• Lots-o-entities BLOBs, FB Messages
OLTP & OLAP
Dev Rate
# of Commits
Total Files 2021
Total Lines of Code 832122
Total Commits 6615
Authors 39
JIRA: 2008-2013
JIRA: Adoption
JIRA: Opened vs Closed
New Committers (by First Commit) vs.
Active Committers (One Commit/Month)
Commits/Month over Time (0.94/trunk)
• 419 jiras in 0.94.0
• 660 jiras in 0.94.1 – 0.94.8
• Frequent, small releases
• Train model
• 4–6 week cycles
• Wire compatible
between releases
• Upgrade possible to
any point release
• Test stability
• Focus on:
• Performance
(FB, Salesforce)
• Stability
http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
>1000
So far...
http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
• Hadoop HDFS Fixes
• Faster Recovery
• Detection
• Replay
• Assign
• System tables
• Filesystem
• Up in zookeeper
• Over the wire
RPC
• Implements protobuf service
• Specification!
• Data on the side
• Encoding
• Compression
PB DATA
Snapshots
• By table
• Snapshot, clone, restore, export
• Inexpensive
• Just metadata
• Good for...
• ackups
• Replication
• Offline processing
Compactions
• Pluggable
• Tiered
• Striped
• Trigger
Tests
• Cluster test module
• Standalone or cluster
• Sizeable
• x data
• x runtime
• “Borrows” test types from all over
• Netflix “ChaosMonkey”
• Apache Accumulo linked-list dataloss checker
Miscellaneous
• Smarter load balancer
• Revamped
• Metrics
• UI
• Etc.
• Hadoop 1.x and 2.x
HBase Ecosystem
Chasm
kiji.org
• Entity-centric, simple model
• Types, complex, compound types
• Each cell is schema versioned
• Works across MR & REST, etc.
• Production users
• Open-source
SQL
• “A SQL skin over HBase”
• Coprocessors, custom filters, jdbc driver
• https://github.com/forcedotcom/phoenix
Phoenix
1.0?
Next
Related: QoS
Next
• Latency resilience/”Latency tolerance”*
• Bring home the outliers
* http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
Next
Faster scans (OLAP)
Next
• More databasey!!!
• Statistics
• 2ndary indexing
• Take 3
• Types
• Serialization
• Keep sort order
larsh@apache.org
stack@apache.org
Hosted by
The Apache HBase Ecosystem
Aaron Kimball, Chief Architect, WibiData
About me
In the beginning…
There was a search engine
… and when building indexes got too hard
Along came an elephant to help push:
The ASF is an ecosystem
And so is Hadoop
HBase: The new ecosystem
Phoenix
Now powering…
Big Data Applications
Customer
RelationsMobile Web
Applications
Hadoop &
HBase
Storage
Analytics
Serving
Real-time model
scoring
Investigative analytics
Major application targets
MapReduce,
Cascading,
Crunch…
Big Data Apps are hard to build
• Serialization & versioning
• Deployment
• Communication between teams
• Front end, back end, short request, batch, real time…
Every Java developer should be able to build
Big Data Apps – today it’s too hard.
Kiji
Kiji is designed to help you build real-time
Big Data Applications on Apache HBase
+ +
100% Apache 2 licensed
Kiji architecture
Leading design decisions
• Store your data in HBase
• Encode it using Avro
• An entity-centric table design
• Manage a data dictionary around tables
• Distribute writes across the cluster
Key features
• Work with big data in rich types with schema
evolution
• Guides users to successful application design
• Scala-based modeling language
• Integration with front-end systems
• Deployment of real-time model scoring
Kiji
• Go to kiji.org and download the BentoBox
– Zero-config Hadoop + HBase + Kiji instance
– “Batteries included”
• 15-minute quickstart guide and a tutorial with
full source code
Come attend !
Want a deep dive on Kiji? KijiCon is tomorrow!
A 1-day workshop of tutorials & hacking
Register @ kijicon.eventbrite.com
Conclusions
• Each month shows new peak interest in HBase
• The ecosystem is growing
• Open source technologists are working hand in
hand to make HBase more accessible
• We’d love your help in the community!
aaron@wibidata.com
Build big data applications.
HBaseCon 2013: General Session

HBaseCon 2013: General Session