Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Secondary Indexing in Phoenix
Jesse Yates
HBase Committer
Software Engineer
LA HBase User Group – September 4, 2013
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
About me
• Developer at Salesforce
– System of Record, Phoenix
• Open Source
– Phoenix
– HBase
– Accumulo
3 LA HUG – Sept ...
Phoenix
• Open Source
– https://github.com/forcedotcom/phoenix
• “SQL-skin” on HBase
– Everyone knows SQL!
• JDBC Driver
–...
Why Index?
• HBase is only sorted on 1 “axis”
• Great for search via a single pattern
Example!
LA HUG – Sept 20135
Example
name:
type:
subtype:
date:
major:
minor:
quantity:
LA HUG – Sept 20136
Secondary Indexes
• Sort on ‘orthogonal’ axis
• Save full-table scan
• Expected database feature
• Hard in HBase b/c of AC...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
9 LA HUG – Sept 2013
http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
Other (Major) Indexing Frameworks
• HBase SEP
– Side-Effects Processor
– Replication-based
– https://github.com/NGDATA/hba...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
Immutable Indexes
• Immutable Rows
• Much easier to implement
• Client-managed
• Bulk-loadable
12 LA HUG – Sept 2013
Bulk Loading
phoenix-hbase.blogspot.com
13 LA HUG – Sept 2013
Index Bulk Loading
Identity Mapper
Custom Phoenix Reducer
14 LA HUG – Sept 2013
HFile Output Format
Index Bulk Loading
PreparedStatement statement = conn.prepareStatement(dmlStatement);
statement.execute();
String upsertSt...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
The “fun” stuff…
17 LA HUG – Sept 2013
1.5 years
18 LA HUG – Sept 2013
Mutable Indexes
• Global Index
• Change row state
– Common use-case
– “expected” implementation
• Covered Columns
19 LA HU...
Usage
• Just SQL!
• Baby name popularity
• Mock demo
20 LA HUG – Sept 2013
Usage
• Selects the most popular name for a given year
SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;
• ...
Usage
• Update rows due to census inaccuracy
– Will only work if the mutable indexing is working
UPSERT INTO baby_names SE...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
•...
Internals
• Index Management
– Build index updates
– Ensures index is ‘cleaned up’
• Recovery Mechanism
– Ensures index up...
“There is no magic”
- Every programming hipster (chipster)
LA HUG – Sept 201325
Mutable Indexing: Standard Write Path
26
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – ...
Mutable Indexing: Standard Write Path
27
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – ...
Mutable Indexing
28
Region
Coprocessor
Host
WAL
Region
Coprocessor
Host
Indexer Builder
WAL Updater
Durable!
Indexer
Index...
Index Management
29
• Lives within a RegionCoprocesorObserver
• Access to the local HRegion
• Specifies the mutations to a...
Why not write my own?
• Managing Cleanup
– Efficient point-in-time correctness
– Performance tricks
• Abstract access to H...
Example: Managing Cleanup
• Updates can arrive out of order
– Client-managed timestamps
LA HUG – Sept 201331
ROW FAMILY QU...
Example: Managing Cleanup
Index Table
LA HUG – Sept 201332
ROW FAMILY QUALIFIER TS
Val1|Row1 Index Fam:Qual 10
Val1|Val2|R...
Example: Managing Cleanup
LA HUG – Sept 201333
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2...
Example: Managing Cleanup
LA HUG – Sept 201334
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam Qual 11 val4
R...
Example: Managing Cleanup
LA HUG – Sept 201335
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual...
Example: Managing Cleanup
LA HUG – Sept 201336
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual...
Managing Cleanup
• History “roll up”
• Out-of-order Updates
• Point-in-time correctness
• Multiple Timestamps per Mutation...
Phoenix Index Builder
• Much simpler than full index management
• Hides cleanup considerations
• Abstracted access to loca...
Phoenix Index Codec
LA HUG – Sept 201339
Dude, where’s my data?
40 LA HUG – Sept 2013
Ensuring Correctness
HBase ACID
• Does NOT give you:
– Cross-row consistency
– Cross-table consistency
• Does give you:
– Durable data on succe...
Key Observation
“Secondary indexing is inherently an easier
problem than full transactions… secondary
index updates are id...
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag
– A...
Failure Recovery
• Custom WALEditCodec
– Encodes index updates
– Supports compressed WAL
• Custom WAL Reader
– Replay inde...
Failure Situations
• Any time before WAL, client replay
• Any time after WAL, HBase replay
• All-or-nothing
LA HUG – Sept ...
Failure #1: Before WAL
46
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #1: Before WAL
47
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
No problem! No data
is s...
Failure #2: After WAL
48
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
Failure #2: After WAL
49
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
WAL replayed via
usual re...
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes
• Roadmap
50 LA HUG – Sept 2013
Roadmap
• Next release of Phoenix
• Performance testing
• Increased adoption
• Adding to HBase (?)
51 LA HUG – Sept 2013
Open Source!
• Main:
https://github.com/forcedotcom/phoenix
• Indexing:
https://github.com/forcedotcom/phoenix/tree/mutabl...
(obligatory hiring slide)
We’re Hiring!
Questions? Comments?
jyates@salesforce.com
@jesse_yates
Upcoming SlideShare
Loading in …5
×

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

3,136 views

Published on

In-depth look at secondary indexing for phoenix

Published in: Technology
  • Be the first to comment

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

  1. 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
  2. 2. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 2 LA HUG – Sept 2013 https://www.madison.k12.wi.us/calendars
  3. 3. About me • Developer at Salesforce – System of Record, Phoenix • Open Source – Phoenix – HBase – Accumulo 3 LA HUG – Sept 2013
  4. 4. Phoenix • Open Source – https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase – Everyone knows SQL! • JDBC Driver – Plug-and-play • Faster than HBase – in some cases 4 LA HUG – Sept 2013
  5. 5. Why Index? • HBase is only sorted on 1 “axis” • Great for search via a single pattern Example! LA HUG – Sept 20135
  6. 6. Example name: type: subtype: date: major: minor: quantity: LA HUG – Sept 20136
  7. 7. Secondary Indexes • Sort on ‘orthogonal’ axis • Save full-table scan • Expected database feature • Hard in HBase b/c of ACID considerations LA HUG – Sept 20137
  8. 8. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 8 LA HUG – Sept 2013
  9. 9. 9 LA HUG – Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
  10. 10. Other (Major) Indexing Frameworks • HBase SEP – Side-Effects Processor – Replication-based – https://github.com/NGDATA/hbase-sep • Huawei – Server-local indexes – Buddy regions – https://github.com/Huawei-Hadoop/hindex 10 LA HUG – Sept 2013
  11. 11. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 11 LA HUG – Sept 2013
  12. 12. Immutable Indexes • Immutable Rows • Much easier to implement • Client-managed • Bulk-loadable 12 LA HUG – Sept 2013
  13. 13. Bulk Loading phoenix-hbase.blogspot.com 13 LA HUG – Sept 2013
  14. 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG – Sept 2013 HFile Output Format
  15. 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG – Sept 2013
  16. 16. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 16 LA HUG – Sept 2013
  17. 17. The “fun” stuff… 17 LA HUG – Sept 2013
  18. 18. 1.5 years 18 LA HUG – Sept 2013
  19. 19. Mutable Indexes • Global Index • Change row state – Common use-case – “expected” implementation • Covered Columns 19 LA HUG – Sept 2013
  20. 20. Usage • Just SQL! • Baby name popularity • Mock demo 20 LA HUG – Sept 2013
  21. 21. Usage • Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; • Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; • Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201321
  22. 22. Usage • Update rows due to census inaccuracy – Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201322
  23. 23. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 23 LA HUG – Sept 2013
  24. 24. Internals • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” 24 LA HUG – Sept 2013
  25. 25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 201325
  26. 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  27. 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  28. 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG – Sept 2013
  29. 29. Index Management 29 • Lives within a RegionCoprocesorObserver • Access to the local HRegion • Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String>getIndexUpdate(Put put); public Map<Mutation, String>getIndexUpdate(Deletedelete); } LA HUG – Sept 2013
  30. 30. Why not write my own? • Managing Cleanup – Efficient point-in-time correctness – Performance tricks • Abstract access to HRegion – Minimal network hops • Sorting correctness – Phoenix typing ensures correct index sorting LA HUG – Sept 201330
  31. 31. Example: Managing Cleanup • Updates can arrive out of order – Client-managed timestamps LA HUG – Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  32. 32. Example: Managing Cleanup Index Table LA HUG – Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  33. 33. Example: Managing Cleanup LA HUG – Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4
  34. 34. Example: Managing Cleanup LA HUG – Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  35. 35. Example: Managing Cleanup LA HUG – Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  36. 36. Example: Managing Cleanup LA HUG – Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  37. 37. Managing Cleanup • History “roll up” • Out-of-order Updates • Point-in-time correctness • Multiple Timestamps per Mutation • Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 201337
  38. 38. Phoenix Index Builder • Much simpler than full index management • Hides cleanup considerations • Abstracted access to local state LA HUG – Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate>getIndexDeletes(TableState state; public Iterable<IndexUpdate>getIndexUpserts(TableState state); }
  39. 39. Phoenix Index Codec LA HUG – Sept 201339
  40. 40. Dude, where’s my data? 40 LA HUG – Sept 2013 Ensuring Correctness
  41. 41. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows 41 LA HUG – Sept 2013
  42. 42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl 42 LA HUG – Sept 2013
  43. 43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right 43 LA HUG – Sept 2013
  44. 44. Failure Recovery • Custom WALEditCodec – Encodes index updates – Supports compressed WAL • Custom WAL Reader – Replay index updates from WAL LA HUG – Sept 201344 <property> <name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w al.IndexedWALEditCodec</value> </property> <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> </property>
  45. 45. Failure Situations • Any time before WAL, client replay • Any time after WAL, HBase replay • All-or-nothing LA HUG – Sept 201345
  46. 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  47. 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG – Sept 2013
  48. 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  49. 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG – Sept 2013
  50. 50. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes • Roadmap 50 LA HUG – Sept 2013
  51. 51. Roadmap • Next release of Phoenix • Performance testing • Increased adoption • Adding to HBase (?) 51 LA HUG – Sept 2013
  52. 52. Open Source! • Main: https://github.com/forcedotcom/phoenix • Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG – Sept 2013
  53. 53. (obligatory hiring slide) We’re Hiring!
  54. 54. Questions? Comments? jyates@salesforce.com @jesse_yates

×