Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

Jesse Yates
Jesse YatesFounder/CEO at Fineo.io
Secondary Indexing in Phoenix
Jesse Yates
HBase Committer
Software Engineer
LA HBase User Group – September 4, 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
2 LA HUG – Sept 2013
https://www.madison.k12.wi.us/calendars
About me
• Developer at Salesforce
– System of Record, Phoenix
• Open Source
– Phoenix
– HBase
– Accumulo
3 LA HUG – Sept 2013
Phoenix
• Open Source
– https://github.com/forcedotcom/phoenix
• “SQL-skin” on HBase
– Everyone knows SQL!
• JDBC Driver
– Plug-and-play
• Faster than HBase
– in some cases
4 LA HUG – Sept 2013
Why Index?
• HBase is only sorted on 1 “axis”
• Great for search via a single pattern
Example!
LA HUG – Sept 20135
Example
name:
type:
subtype:
date:
major:
minor:
quantity:
LA HUG – Sept 20136
Secondary Indexes
• Sort on ‘orthogonal’ axis
• Save full-table scan
• Expected database feature
• Hard in HBase b/c of ACID considerations
LA HUG – Sept 20137
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
8 LA HUG – Sept 2013
9 LA HUG – Sept 2013
http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
Other (Major) Indexing Frameworks
• HBase SEP
– Side-Effects Processor
– Replication-based
– https://github.com/NGDATA/hbase-sep
• Huawei
– Server-local indexes
– Buddy regions
– https://github.com/Huawei-Hadoop/hindex
10 LA HUG – Sept 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
11 LA HUG – Sept 2013
Immutable Indexes
• Immutable Rows
• Much easier to implement
• Client-managed
• Bulk-loadable
12 LA HUG – Sept 2013
Bulk Loading
phoenix-hbase.blogspot.com
13 LA HUG – Sept 2013
Index Bulk Loading
Identity Mapper
Custom Phoenix Reducer
14 LA HUG – Sept 2013
HFile Output Format
Index Bulk Loading
PreparedStatement statement = conn.prepareStatement(dmlStatement);
statement.execute();
String upsertStmt = "upsert into
core.entity_history(organization_id,key_prefix,entity_history_id,
created_by, created_date)n" + "values(?,?,?,?,?)";
statement = conn.prepareStatement(upsertStmt);
… //set values
Iterator<Pair<byte[],List<KeyValue>>>dataIterator =
PhoenixRuntime.getUncommittedDataIterator(conn);
15 LA HUG – Sept 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
16 LA HUG – Sept 2013
The “fun” stuff…
17 LA HUG – Sept 2013
1.5 years
18 LA HUG – Sept 2013
Mutable Indexes
• Global Index
• Change row state
– Common use-case
– “expected” implementation
• Covered Columns
19 LA HUG – Sept 2013
Usage
• Just SQL!
• Baby name popularity
• Mock demo
20 LA HUG – Sept 2013
Usage
• Selects the most popular name for a given year
SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;
• Selects the total occurrences of a given name across all years
SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names
WHERE name='Jesse' GROUP BY name;
• Selects the total occurrences of a given name across all years allowing an
index to be used
SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse'
GROUP BY NAME;
LA HUG – Sept 201321
Usage
• Update rows due to census inaccuracy
– Will only work if the mutable indexing is working
UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name
FROM baby_names WHERE name='Jesse';
• Selects the now updated data (from the index table)
SELECT name,sum(occurrences) FROM baby_names WHERE
name='Jesse' GROUP BY NAME;
• Index table still used in scans
EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE
name='Jesse' GROUP BY NAME;
LA HUG – Sept 201322
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
23 LA HUG – Sept 2013
Internals
• Index Management
– Build index updates
– Ensures index is ‘cleaned up’
• Recovery Mechanism
– Ensures index updates are “ACID”
24 LA HUG – Sept 2013
“There is no magic”
- Every programming hipster (chipster)
LA HUG – Sept 201325
Mutable Indexing: Standard Write Path
26
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Mutable Indexing: Standard Write Path
27
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Mutable Indexing
28
Region
Coprocessor
Host
WAL
Region
Coprocessor
Host
Indexer Builder
WAL Updater
Durable!
Indexer
Index Table
Index Table
Index Table
Codec
LA HUG – Sept 2013
Index Management
29
• Lives within a RegionCoprocesorObserver
• Access to the local HRegion
• Specifies the mutations to apply to the index
tables
public interface IndexBuilder{
public void setup(RegionCoprocessorEnvironmentenv);
public Map<Mutation, String>getIndexUpdate(Put put);
public Map<Mutation, String>getIndexUpdate(Deletedelete);
}
LA HUG – Sept 2013
Why not write my own?
• Managing Cleanup
– Efficient point-in-time correctness
– Performance tricks
• Abstract access to HRegion
– Minimal network hops
• Sorting correctness
– Phoenix typing ensures correct index sorting
LA HUG – Sept 201330
Example: Managing Cleanup
• Updates can arrive out of order
– Client-managed timestamps
LA HUG – Sept 201331
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
Example: Managing Cleanup
Index Table
LA HUG – Sept 201332
ROW FAMILY QUALIFIER TS
Val1|Row1 Index Fam:Qual 10
Val1|Val2|Row1 Index Fam:Qual
Fam2:Qual2
12
Val3|Val2|Row1 Index Fam:Qual
Fam2:Qual2
13
Example: Managing Cleanup
LA HUG – Sept 201333
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
Row1 Fam Qual 11 val4
Example: Managing Cleanup
LA HUG – Sept 201334
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam Qual 11 val4
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
Example: Managing Cleanup
LA HUG – Sept 201335
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:Qual
Fam2:Qual2
12
Va1l|Val2|Row1 Index Fam:Qual
Fam2:Qual2
12
Val3|Val2|Row1 Index Fam:Qual
Fam2:Qual2
13
Example: Managing Cleanup
LA HUG – Sept 201336
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:Qual
Fam2:Qual2
12
Va1l|Val2|Row1 Index Fam:Qual
Fam2:Qual2
12
Val3|Val2|Row1 Index Fam:Qual
Fam2:Qual2
13
Managing Cleanup
• History “roll up”
• Out-of-order Updates
• Point-in-time correctness
• Multiple Timestamps per Mutation
• Delete vs. DeleteColumn vs. DeleteFamily
Surprisingly hard!
LA HUG – Sept 201337
Phoenix Index Builder
• Much simpler than full index management
• Hides cleanup considerations
• Abstracted access to local state
LA HUG – Sept 201338
public interfaceIndexCodec{
public void initialize(RegionCoprocessorEnvironmentenv);
public Iterable<IndexUpdate>getIndexDeletes(TableState state;
public Iterable<IndexUpdate>getIndexUpserts(TableState state);
}
Phoenix Index Codec
LA HUG – Sept 201339
Dude, where’s my data?
40 LA HUG – Sept 2013
Ensuring Correctness
HBase ACID
• Does NOT give you:
– Cross-row consistency
– Cross-table consistency
• Does give you:
– Durable data on success
– Visibility on success without partial rows
41 LA HUG – Sept 2013
Key Observation
“Secondary indexing is inherently an easier
problem than full transactions… secondary
index updates are idempotent.”
- Lars Hofhansl
42 LA HUG – Sept 2013
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag
– As long as we get the order right
43 LA HUG – Sept 2013
Failure Recovery
• Custom WALEditCodec
– Encodes index updates
– Supports compressed WAL
• Custom WAL Reader
– Replay index updates from WAL
LA HUG – Sept 201344
<property>
<name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w
al.IndexedWALEditCodec</value>
</property>
<property>
<name>hbase.regionserver.hlog.reader.impl</name>
<value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>
</property>
Failure Situations
• Any time before WAL, client replay
• Any time after WAL, HBase replay
• All-or-nothing
LA HUG – Sept 201345
Failure #1: Before WAL
46
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #1: Before WAL
47
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
No problem! No data
is stored in the
WAL, client just retries
entire update.
LA HUG – Sept 2013
Failure #2: After WAL
48
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #2: After WAL
49
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
WAL replayed via
usual replay
mechanisms
LA HUG – Sept 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes
• Roadmap
50 LA HUG – Sept 2013
Roadmap
• Next release of Phoenix
• Performance testing
• Increased adoption
• Adding to HBase (?)
51 LA HUG – Sept 2013
Open Source!
• Main:
https://github.com/forcedotcom/phoenix
• Indexing:
https://github.com/forcedotcom/phoenix/tree/mutable-si
52 LA HUG – Sept 2013
(obligatory hiring slide)
We’re Hiring!
Questions? Comments?
jyates@salesforce.com
@jesse_yates
1 of 54

Recommended

Musings on Secondary Indexing in HBase by
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseJesse Yates
3K views33 slides
Open Source SQL Databases by
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL DatabasesEmanuel Calvo
83 views39 slides
Powering an API with GraphQL, Golang, and NoSQL by
Powering an API with GraphQL, Golang, and NoSQLPowering an API with GraphQL, Golang, and NoSQL
Powering an API with GraphQL, Golang, and NoSQLNic Raboy
411 views47 slides
Etu Solution Day 2014 Track-D: 掌握Impala和Spark by
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
691 views60 slides
Get involved with the Apache Software Foundation by
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software FoundationShalin Shekhar Mangar
1.5K views20 slides
Apache Spark Components by
Apache Spark ComponentsApache Spark Components
Apache Spark ComponentsGirish Khanzode
3.3K views91 slides

More Related Content

What's hot

HBaseConEast2016: Splice machine open source rdbms by
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
762 views17 slides
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i... by
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...Michael Stack
592 views26 slides
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target by
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
5.2K views21 slides
Apache Con 2021 Structured Data Streaming by
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingShivji Kumar Jha
332 views40 slides
Apachecon Europe 2012: Operating HBase - Things you need to know by
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
6K views25 slides
HBaseConEast2016: HBase and Spark, State of the Art by
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
1.4K views31 slides

What's hot(20)

HBaseConEast2016: Splice machine open source rdbms by Michael Stack
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
Michael Stack762 views
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i... by Michael Stack
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
Michael Stack592 views
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target by Lucidworks
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks5.2K views
Apache Con 2021 Structured Data Streaming by Shivji Kumar Jha
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
Shivji Kumar Jha332 views
Apachecon Europe 2012: Operating HBase - Things you need to know by Christian Gügi
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi6K views
HBaseConEast2016: HBase and Spark, State of the Art by Michael Stack
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
Michael Stack1.4K views
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way" by PivotalOpenSourceHub
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
Getting Started with Hadoop by Cloudera, Inc.
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.3.5K views
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using... by Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Caserta 1.6K views
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft by Chester Chen
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen330 views
Riak at shareaholic by freerobby
Riak at shareaholicRiak at shareaholic
Riak at shareaholic
freerobby19.1K views
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio... by Chris Fregly
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Chris Fregly3.6K views
HBase and Impala Notes - Munich HUG - 20131017 by larsgeorge
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge10.2K views
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming... by Trieu Nguyen
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Trieu Nguyen3.6K views
Semi-Supervised Learning In An Adversarial Environment by DataWorks Summit
Semi-Supervised Learning In An Adversarial EnvironmentSemi-Supervised Learning In An Adversarial Environment
Semi-Supervised Learning In An Adversarial Environment
DataWorks Summit1.1K views

Similar to Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

High-level languages for Big Data Analytics (Presentation) by
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)Jose Luis Lopez Pino
3K views49 slides
Search onhadoopsfhug081413 by
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413gregchanan
1.4K views33 slides
Apache Hadoop 1.1 by
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
1.1K views41 slides
Map reducecloudtech by
Map reducecloudtechMap reducecloudtech
Map reducecloudtechJakir Hossain
766 views62 slides
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14... by
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
4.9K views58 slides

Similar to Phoenix Secondary Indexing - LA HUG Sept 9th, 2013(20)

High-level languages for Big Data Analytics (Presentation) by Jose Luis Lopez Pino
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
Search onhadoopsfhug081413 by gregchanan
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413
gregchanan1.4K views
Apache Hadoop 1.1 by Sperasoft
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft1.1K views
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14... by Lucas Jellema
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema4.9K views
Elasticsearch + Cascading for Scalable Log Processing by Cascading
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
Cascading1K views
Untangling - fall2017 - week 8 by Derek Jacoby
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
Derek Jacoby163 views
Hadoop_EcoSystem_Pradeep_MG by Pradeep MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
Pradeep MG272 views
Challenges of Implementing an Advanced SQL Engine on Hadoop by DataWorks Summit
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit2.2K views
Indexing Complex PostgreSQL Data Types by Jonathan Katz
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz17.1K views
Hive Evolution: ApacheCon NA 2010 by John Sichi
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
John Sichi4.6K views
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P... by viirya
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya3.1K views
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : by Mark Rittman
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Mark Rittman1.4K views
FOSS4G 2017 Spatial Sql for Rookies by Todd Barr
FOSS4G 2017 Spatial Sql for RookiesFOSS4G 2017 Spatial Sql for Rookies
FOSS4G 2017 Spatial Sql for Rookies
Todd Barr921 views
Azure Cosmos DB: Features, Practical Use and Optimization " by GlobalLogic Ukraine
Azure Cosmos DB: Features, Practical Use and Optimization "Azure Cosmos DB: Features, Practical Use and Optimization "
Azure Cosmos DB: Features, Practical Use and Optimization "
GlobalLogic Ukraine2.6K views

Recently uploaded

Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
18 views161 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides
PRODUCT LISTING.pptx by
PRODUCT LISTING.pptxPRODUCT LISTING.pptx
PRODUCT LISTING.pptxangelicacueva6
14 views1 slide
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
92 views32 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
21 views15 slides

Recently uploaded(20)

Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson92 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab21 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman36 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

  • 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
  • 2. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 2 LA HUG – Sept 2013 https://www.madison.k12.wi.us/calendars
  • 3. About me • Developer at Salesforce – System of Record, Phoenix • Open Source – Phoenix – HBase – Accumulo 3 LA HUG – Sept 2013
  • 4. Phoenix • Open Source – https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase – Everyone knows SQL! • JDBC Driver – Plug-and-play • Faster than HBase – in some cases 4 LA HUG – Sept 2013
  • 5. Why Index? • HBase is only sorted on 1 “axis” • Great for search via a single pattern Example! LA HUG – Sept 20135
  • 7. Secondary Indexes • Sort on ‘orthogonal’ axis • Save full-table scan • Expected database feature • Hard in HBase b/c of ACID considerations LA HUG – Sept 20137
  • 8. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 8 LA HUG – Sept 2013
  • 9. 9 LA HUG – Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
  • 10. Other (Major) Indexing Frameworks • HBase SEP – Side-Effects Processor – Replication-based – https://github.com/NGDATA/hbase-sep • Huawei – Server-local indexes – Buddy regions – https://github.com/Huawei-Hadoop/hindex 10 LA HUG – Sept 2013
  • 11. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 11 LA HUG – Sept 2013
  • 12. Immutable Indexes • Immutable Rows • Much easier to implement • Client-managed • Bulk-loadable 12 LA HUG – Sept 2013
  • 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG – Sept 2013 HFile Output Format
  • 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG – Sept 2013
  • 16. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 16 LA HUG – Sept 2013
  • 17. The “fun” stuff… 17 LA HUG – Sept 2013
  • 18. 1.5 years 18 LA HUG – Sept 2013
  • 19. Mutable Indexes • Global Index • Change row state – Common use-case – “expected” implementation • Covered Columns 19 LA HUG – Sept 2013
  • 20. Usage • Just SQL! • Baby name popularity • Mock demo 20 LA HUG – Sept 2013
  • 21. Usage • Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; • Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; • Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201321
  • 22. Usage • Update rows due to census inaccuracy – Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201322
  • 23. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 23 LA HUG – Sept 2013
  • 24. Internals • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” 24 LA HUG – Sept 2013
  • 25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 201325
  • 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG – Sept 2013
  • 29. Index Management 29 • Lives within a RegionCoprocesorObserver • Access to the local HRegion • Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String>getIndexUpdate(Put put); public Map<Mutation, String>getIndexUpdate(Deletedelete); } LA HUG – Sept 2013
  • 30. Why not write my own? • Managing Cleanup – Efficient point-in-time correctness – Performance tricks • Abstract access to HRegion – Minimal network hops • Sorting correctness – Phoenix typing ensures correct index sorting LA HUG – Sept 201330
  • 31. Example: Managing Cleanup • Updates can arrive out of order – Client-managed timestamps LA HUG – Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  • 32. Example: Managing Cleanup Index Table LA HUG – Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 33. Example: Managing Cleanup LA HUG – Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4
  • 34. Example: Managing Cleanup LA HUG – Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  • 35. Example: Managing Cleanup LA HUG – Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 36. Example: Managing Cleanup LA HUG – Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 37. Managing Cleanup • History “roll up” • Out-of-order Updates • Point-in-time correctness • Multiple Timestamps per Mutation • Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 201337
  • 38. Phoenix Index Builder • Much simpler than full index management • Hides cleanup considerations • Abstracted access to local state LA HUG – Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate>getIndexDeletes(TableState state; public Iterable<IndexUpdate>getIndexUpserts(TableState state); }
  • 39. Phoenix Index Codec LA HUG – Sept 201339
  • 40. Dude, where’s my data? 40 LA HUG – Sept 2013 Ensuring Correctness
  • 41. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows 41 LA HUG – Sept 2013
  • 42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl 42 LA HUG – Sept 2013
  • 43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right 43 LA HUG – Sept 2013
  • 44. Failure Recovery • Custom WALEditCodec – Encodes index updates – Supports compressed WAL • Custom WAL Reader – Replay index updates from WAL LA HUG – Sept 201344 <property> <name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w al.IndexedWALEditCodec</value> </property> <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> </property>
  • 45. Failure Situations • Any time before WAL, client replay • Any time after WAL, HBase replay • All-or-nothing LA HUG – Sept 201345
  • 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG – Sept 2013
  • 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG – Sept 2013
  • 50. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes • Roadmap 50 LA HUG – Sept 2013
  • 51. Roadmap • Next release of Phoenix • Performance testing • Increased adoption • Adding to HBase (?) 51 LA HUG – Sept 2013
  • 52. Open Source! • Main: https://github.com/forcedotcom/phoenix • Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG – Sept 2013

Editor's Notes

  1. Ok, not this elephant…
  2. e.g. stats, historical data
  3. Actual implementation coming in example blog post
  4. Actual implementation coming in example blog post
  5. And don’t forget to cleanup the old row state!
  6. 8pt font, &lt;200 lines, including comments