SlideShare a Scribd company logo
1 of 21
Download to read offline
HBase Data Types 
Nick Dimiduk, Hortonworks 
@xefyr n10k.com
Agenda 
• Motivations 
• Progress thus far 
• Future work 
• Examples 
• More Examples 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
2
Why introduce types? 
• Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ 
• Rule of least surprise 
• Interoperability across tools 
• Distill best practices 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
3
Considerations 
• Opt-in for current users 
• Easy transition for existing applications 
• Client-side only mostly 
– Filters, Split policies, Coprocessors, Block encoding 
• Avoid POJO constraints 
– No required base-class/interface 
– No magic (avoid ASM, ORM) 
• Non-Java clients 
• HBASE-8089 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
4
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
5
Inspiration 
• Orderly 
• PostgreSQL / PostGIS 
• HBASE-7221 
• HBASE-7692 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
6
Features: Encoding 
• Order preservation 
• Override direction (ASC/DSC) 
• Fixed, variable-width 
• Null-able 
• Self-identifying 
• Efficient 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
7
Features: API 
• Complex type encoding 
– Compound rowkey pattern 
– Order preservation 
– Nullable fields 
• Runtime metadata 
• User-extensible 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
8
Implementation$ 
HBASE-8089
Implementation: Encoding 
o.a.h.h.util.OrderedBytes 
• null 
• numeric, +/-Inf, NaN 
• int8, int16, int32, int64 
• float32, float64 
• variable-length text 
• variable-length blob 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
o.a.h.h.util.Bytes 
• numeric 
• boolean 
• int16, int32, int64 
• float32, float64 
• variable-length text 
2014-­‐11-­‐18 
10
Implementation: API 
interface DataType<T> 
• decode() 
• encode() 
• encodedClass() 
• encodedLength() 
• getOrder() 
• isNullable() 
• isOrderPreserving() 
• isSkippable() 
• skip() 
implements DataType 
• OrderedXXX 
• RawXXX 
• Struct 
– StructBuilder 
– StructIterator 
– TerminatedWrapper 
– FixedLengthWrapper 
• Union{2,3,4} 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
11
Up Next 
• “Default” types 
• More complex types 
– Arrays/Lists 
– Maps/Dicts 
• Tool integration 
– Apache Phoenix 
– Cloudera Kite 
• Performance audit, HBASE-8694 
• Improved metadata, 
HBASE-8863 
– isCastableTo 
– isCoercableTo 
– isComparableTo 
• TypedTable, HBASE-7941 
• Beyond Java, HBASE-10091 
– REST 
– Thrift 
– Shell 
• ImportTsv, HBASE-8593 
• User documentation 
• Coprocessors? 
• Filters? 
• CAS? 
• DataBlockEncoders? 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
12
Examples
A case for TypedTable 
Put p = new Put(Bytes.toBytes(u.user)); 
p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); 
p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); 
p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); 
p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
14
A case for TypedTable! 
static final RawString ENC_STR = new RawString();! 
static final RawLong ENC_LONG = new RawLong();! 
--! 
! 
SimplePositionedByteRange pbr =! 
new SimplePositionedByteRange(100);! 
ENC_STR.encode(pbr, u.user);! 
Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), 
pbr.getPosition()));! 
p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! 
pbr.setPosition(0);! 
ENC_STR.encode(pbr, u.name);! 
p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! 
...! 
2014-­‐11-­‐18 
15 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Structs: writing 
! 
! 
! 
Struct struct = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.toStruct();! 
PositionedByteRange buf1 =! 
new SimplePositionedByteRange(7);! 
struct.encode(buf1,! 
new Object[] { BigDecimal.ONE, "foo" });! 
! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
16
Structs: reading 
! 
! 
! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
! 
> BigDecimal.ONE, foo! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
17
Structs: schema migration 
Struct addedFields = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedNumeric.ASCENDING)! 
.toStruct();! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
> BigDecimal.ONE, foo, null, null! 
!2014-­‐11-­‐18 
18 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Protobuf (HBASE-11161) 
! 
class PBKeyValue extends PBType<CellProtos.KeyValue> {! 
! 
@Override! 
public int encode(PositionedByteRange dst, KeyValue val) {! 
CodedOutputStream os = outputStreamFromByteRange(dst);! 
int before = os.spaceLeft(), after, written;! 
val.writeTo(os);! 
after = os.spaceLeft();! 
written = before - after;! 
dst.setPosition(dst.getPosition() + written);! 
return written;! 
}! 
2014-­‐11-­‐18 
19 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
More Examples$ 
https://gist.github.com/ndimiduk/bcf33f09cc7e4408f684
Thanks! 
M A N N I N G 
Nick Dimiduk 
Amandeep Khurana 
FOREWORD BY 
Michael Stack 
hbaseinaction.com 
Nick Dimiduk 
github.com/ndimiduk 
@xefyr 
n10k.com 
http://s.apache.org/bGN 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
21

More Related Content

Viewers also liked

The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick GuideAsim Jalis
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics QuotesCloudlytics
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBernard Marr
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 

Viewers also liked (12)

Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Big data road map
Big data road mapBig data road map
Big data road map
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes11 Hard to Ignore Data Analytics Quotes
11 Hard to Ignore Data Analytics Quotes
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar to HBase Data Types

OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
OpenStack Swift的性能调优
OpenStack Swift的性能调优OpenStack Swift的性能调优
OpenStack Swift的性能调优Hardway Hou
 
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkNative Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkZachary Klein
 
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...TelecomValley
 
Postcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPostcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPiyush Pattanayak
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar SlidesDuraSpace
 
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...InfluxData
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Amazon Web Services
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetesdtoledo67
 
Developing applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKDeveloping applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKHorea Porutiu
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginnersarcomem
 
Three Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersThree Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersBen Hall
 
FIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca
 
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesCodemotion
 
CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006Mike Linksvayer
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack ArchitectureMirantis
 

Similar to HBase Data Types (20)

OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
OpenStack Swift的性能调优
OpenStack Swift的性能调优OpenStack Swift的性能调优
OpenStack Swift的性能调优
 
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut FrameworkNative Cloud-Native: Building Agile Microservices with the Micronaut Framework
Native Cloud-Native: Building Agile Microservices with the Micronaut Framework
 
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
SophiaConf2010 Présentation des Retours d'expériences de la Conférence du 08 ...
 
Postcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration nullPostcards from the post xss world- content exfiltration null
Postcards from the post xss world- content exfiltration null
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetes
 
Developing applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDKDeveloping applications with Hyperledger Fabric SDK
Developing applications with Hyperledger Fabric SDK
 
Building Client-Side Attacks with HTML5 Features
Building Client-Side Attacks with HTML5 FeaturesBuilding Client-Side Attacks with HTML5 Features
Building Client-Side Attacks with HTML5 Features
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Three Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside ContainersThree Years of Lessons Running Potentially Malicious Code Inside Containers
Three Years of Lessons Running Potentially Malicious Code Inside Containers
 
FIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 MinutesFIWARE Primer - Learn FIWARE in 60 Minutes
FIWARE Primer - Learn FIWARE in 60 Minutes
 
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 MinutesFederico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
Federico Michele Facca - FIWARE Primer - Learn FIWARE in 60 Minutes
 
CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006CCi Technology Infrastructure 2006
CCi Technology Infrastructure 2006
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 

More from Nick Dimiduk

Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 

More from Nick Dimiduk (12)

Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

HBase Data Types

  • 1. HBase Data Types Nick Dimiduk, Hortonworks @xefyr n10k.com
  • 2. Agenda • Motivations • Progress thus far • Future work • Examples • More Examples Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 2
  • 3. Why introduce types? • Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ • Rule of least surprise • Interoperability across tools • Distill best practices Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 3
  • 4. Considerations • Opt-in for current users • Easy transition for existing applications • Client-side only mostly – Filters, Split policies, Coprocessors, Block encoding • Avoid POJO constraints – No required base-class/interface – No magic (avoid ASM, ORM) • Non-Java clients • HBASE-8089 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 4
  • 5. Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 5
  • 6. Inspiration • Orderly • PostgreSQL / PostGIS • HBASE-7221 • HBASE-7692 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 6
  • 7. Features: Encoding • Order preservation • Override direction (ASC/DSC) • Fixed, variable-width • Null-able • Self-identifying • Efficient Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 7
  • 8. Features: API • Complex type encoding – Compound rowkey pattern – Order preservation – Nullable fields • Runtime metadata • User-extensible Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 8
  • 10. Implementation: Encoding o.a.h.h.util.OrderedBytes • null • numeric, +/-Inf, NaN • int8, int16, int32, int64 • float32, float64 • variable-length text • variable-length blob Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. o.a.h.h.util.Bytes • numeric • boolean • int16, int32, int64 • float32, float64 • variable-length text 2014-­‐11-­‐18 10
  • 11. Implementation: API interface DataType<T> • decode() • encode() • encodedClass() • encodedLength() • getOrder() • isNullable() • isOrderPreserving() • isSkippable() • skip() implements DataType • OrderedXXX • RawXXX • Struct – StructBuilder – StructIterator – TerminatedWrapper – FixedLengthWrapper • Union{2,3,4} Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 11
  • 12. Up Next • “Default” types • More complex types – Arrays/Lists – Maps/Dicts • Tool integration – Apache Phoenix – Cloudera Kite • Performance audit, HBASE-8694 • Improved metadata, HBASE-8863 – isCastableTo – isCoercableTo – isComparableTo • TypedTable, HBASE-7941 • Beyond Java, HBASE-10091 – REST – Thrift – Shell • ImportTsv, HBASE-8593 • User documentation • Coprocessors? • Filters? • CAS? • DataBlockEncoders? Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 12
  • 14. A case for TypedTable Put p = new Put(Bytes.toBytes(u.user)); p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 14
  • 15. A case for TypedTable! static final RawString ENC_STR = new RawString();! static final RawLong ENC_LONG = new RawLong();! --! ! SimplePositionedByteRange pbr =! new SimplePositionedByteRange(100);! ENC_STR.encode(pbr, u.user);! Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), pbr.getPosition()));! p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! pbr.setPosition(0);! ENC_STR.encode(pbr, u.name);! p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! ...! 2014-­‐11-­‐18 15 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 16. Structs: writing ! ! ! Struct struct = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .toStruct();! PositionedByteRange buf1 =! new SimplePositionedByteRange(7);! struct.encode(buf1,! new Object[] { BigDecimal.ONE, "foo" });! ! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 16
  • 17. Structs: reading ! ! ! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! ! > BigDecimal.ONE, foo! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 17
  • 18. Structs: schema migration Struct addedFields = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedNumeric.ASCENDING)! .toStruct();! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! > BigDecimal.ONE, foo, null, null! !2014-­‐11-­‐18 18 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 19. Protobuf (HBASE-11161) ! class PBKeyValue extends PBType<CellProtos.KeyValue> {! ! @Override! public int encode(PositionedByteRange dst, KeyValue val) {! CodedOutputStream os = outputStreamFromByteRange(dst);! int before = os.spaceLeft(), after, written;! val.writeTo(os);! after = os.spaceLeft();! written = before - after;! dst.setPosition(dst.getPosition() + written);! return written;! }! 2014-­‐11-­‐18 19 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 21. Thanks! M A N N I N G Nick Dimiduk Amandeep Khurana FOREWORD BY Michael Stack hbaseinaction.com Nick Dimiduk github.com/ndimiduk @xefyr n10k.com http://s.apache.org/bGN Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 21