SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
Learn the evolution and consolidation of Bloomberg's core infrastructure around fewer, faster, and simpler systems, and the role HBase plays within that effort. You'll also hear about HBase modifications to accommodate the "medium data" use case and get a preview of what's to come.
Learn the evolution and consolidation of Bloomberg's core infrastructure around fewer, faster, and simpler systems, and the role HBase plays within that effort. You'll also hear about HBase modifications to accommodate the "medium data" use case and get a preview of what's to come.
5.
HBASEATBLOOMBERG//
September 28: Full Workshop at Bloomberg
September 30: Showcase at Strata Hadoop
Call for papers at:
bloomberglabs.com/data-science
DATA SCIENCE
FOR SOCIAL GOOD:
GOVERNMENT INNOVATION,
PUBLIC HEALTH, ENVIRONMENT,
EDUCATION
6.
HBASEATBLOOMBERG//
6
• We have a “medium data” problem…
• Speed and availability are paramount
• Hundreds of thousands of users with
expensive requests
We’ve built many systems
to address
DATA MANAGEMENT TODAY
7.
HBASEATBLOOMBERG//
DATA MANAGEMENT CHALLENGES
7
• Single security
analytics on Big Iron
• Replication of
Systems and Data
• Complexity kills
Top 500 Supercomputer list, 2013
>96% Linux. 100% of top 40.
8.
HBASEATBLOOMBERG//
DATA MANAGEMENT TOMORROW
8
• Simplicity and
performance
• Benefit from external
developments
• Retain our
independence
• Details matter
9.
HBASEATBLOOMBERG//
THE PREMISE
9
• Can apply big data techniques to our medium
data problem, by addressing gaps in existing
open systems
• HBase is a good bet
• Part of a broader whole
• The Biggest community wins
10.
HBASEATBLOOMBERG//
CHALLENGES
Our requirements from HBase are:
• Read performance – fast with low variability
• High availability
• Operational simplicity
• Efficient use of good hardware
• Expressive power
Bloomberg has been investing in all these
aspects of HBase
16.
HBASEATBLOOMBERG//
FURTHER BOLSTER RELIABILITY
16
Great strides such as HBASE-10070 but more to do
• Improved reconciliation of
state between Master,
META and ZK
• More determinism in
Admin/Master operations
17.
HBASEATBLOOMBERG//
BENEFIT FROM MODERN HARDWARE
17
• 32 cores - 256GB RAM – SSD - untapped potential
• CPU load max 20% , inadequate throughput
• Multi-RS administratively painful
• Much better story with memory
18.
HBASEATBLOOMBERG//
IMPROVE MULTI-TENANCY
18
• Mixed workloads challenging
• interactive vs batch
• read vs write
• different read access
patterns
• Many solutions in progress
• Administrative simplicity is key
19.
HBASEATBLOOMBERG//
SPARK INTEGRATION
19
• Analytical frameworks need a distributed database
• Columnar file format != column database
• Integrate with HBase to move towards the
universal database
20.
HBASEATBLOOMBERG//
ANALYTICS: EFFICIENCY
20
• Choice of row and columnar storage engines
• Expose primitives for efficiency:
• Column pruning
• Predicate pushdowns
• Data locality
21.
HBASEATBLOOMBERG//
THE FUTURE IS BRIGHT
21
• The state of the “Hadoop Database” union is strong
– Increasing adoption
– Strong foundation
– Great community
• Prominent role in the data & analytics platform of
the future
• Let’s go create the future