Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

on

  • 1,714 views

In-depth look at secondary indexing for phoenix

In-depth look at secondary indexing for phoenix

Statistics

Views

Total Views
1,714
Views on SlideShare
1,714
Embed Views
0

Actions

Likes
5
Downloads
28
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Ok, not this elephant…
  • e.g. stats, historical data
  • Actual implementation coming in example blog post
  • Actual implementation coming in example blog post
  • And don’t forget to cleanup the old row state!
  • 8pt font, <200 lines, including comments

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013 Presentation Transcript

  • 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
  • 2. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 2 LA HUG – Sept 2013 https://www.madison.k12.wi.us/calendars
  • 3. About me • Developer at Salesforce – System of Record, Phoenix • Open Source – Phoenix – HBase – Accumulo 3 LA HUG – Sept 2013
  • 4. Phoenix • Open Source – https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase – Everyone knows SQL! • JDBC Driver – Plug-and-play • Faster than HBase – in some cases 4 LA HUG – Sept 2013
  • 5. Why Index? • HBase is only sorted on 1 “axis” • Great for search via a single pattern Example! LA HUG – Sept 20135
  • 6. Example name: type: subtype: date: major: minor: quantity: LA HUG – Sept 20136
  • 7. Secondary Indexes • Sort on ‘orthogonal’ axis • Save full-table scan • Expected database feature • Hard in HBase b/c of ACID considerations LA HUG – Sept 20137
  • 8. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 8 LA HUG – Sept 2013
  • 9. 9 LA HUG – Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
  • 10. Other (Major) Indexing Frameworks • HBase SEP – Side-Effects Processor – Replication-based – https://github.com/NGDATA/hbase-sep • Huawei – Server-local indexes – Buddy regions – https://github.com/Huawei-Hadoop/hindex 10 LA HUG – Sept 2013
  • 11. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 11 LA HUG – Sept 2013
  • 12. Immutable Indexes • Immutable Rows • Much easier to implement • Client-managed • Bulk-loadable 12 LA HUG – Sept 2013
  • 13. Bulk Loading phoenix-hbase.blogspot.com 13 LA HUG – Sept 2013
  • 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG – Sept 2013 HFile Output Format
  • 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG – Sept 2013
  • 16. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 16 LA HUG – Sept 2013
  • 17. The “fun” stuff… 17 LA HUG – Sept 2013
  • 18. 1.5 years 18 LA HUG – Sept 2013
  • 19. Mutable Indexes • Global Index • Change row state – Common use-case – “expected” implementation • Covered Columns 19 LA HUG – Sept 2013
  • 20. Usage • Just SQL! • Baby name popularity • Mock demo 20 LA HUG – Sept 2013
  • 21. Usage • Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; • Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; • Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201321
  • 22. Usage • Update rows due to census inaccuracy – Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201322
  • 23. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 23 LA HUG – Sept 2013
  • 24. Internals • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” 24 LA HUG – Sept 2013
  • 25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 201325
  • 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG – Sept 2013
  • 29. Index Management 29 • Lives within a RegionCoprocesorObserver • Access to the local HRegion • Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String>getIndexUpdate(Put put); public Map<Mutation, String>getIndexUpdate(Deletedelete); } LA HUG – Sept 2013
  • 30. Why not write my own? • Managing Cleanup – Efficient point-in-time correctness – Performance tricks • Abstract access to HRegion – Minimal network hops • Sorting correctness – Phoenix typing ensures correct index sorting LA HUG – Sept 201330
  • 31. Example: Managing Cleanup • Updates can arrive out of order – Client-managed timestamps LA HUG – Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  • 32. Example: Managing Cleanup Index Table LA HUG – Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 33. Example: Managing Cleanup LA HUG – Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4
  • 34. Example: Managing Cleanup LA HUG – Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
  • 35. Example: Managing Cleanup LA HUG – Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 36. Example: Managing Cleanup LA HUG – Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
  • 37. Managing Cleanup • History “roll up” • Out-of-order Updates • Point-in-time correctness • Multiple Timestamps per Mutation • Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 201337
  • 38. Phoenix Index Builder • Much simpler than full index management • Hides cleanup considerations • Abstracted access to local state LA HUG – Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate>getIndexDeletes(TableState state; public Iterable<IndexUpdate>getIndexUpserts(TableState state); }
  • 39. Phoenix Index Codec LA HUG – Sept 201339
  • 40. Dude, where’s my data? 40 LA HUG – Sept 2013 Ensuring Correctness
  • 41. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows 41 LA HUG – Sept 2013
  • 42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl 42 LA HUG – Sept 2013
  • 43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right 43 LA HUG – Sept 2013
  • 44. Failure Recovery • Custom WALEditCodec – Encodes index updates – Supports compressed WAL • Custom WAL Reader – Replay index updates from WAL LA HUG – Sept 201344 <property> <name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w al.IndexedWALEditCodec</value> </property> <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> </property>
  • 45. Failure Situations • Any time before WAL, client replay • Any time after WAL, HBase replay • All-or-nothing LA HUG – Sept 201345
  • 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG – Sept 2013
  • 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
  • 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG – Sept 2013
  • 50. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes • Roadmap 50 LA HUG – Sept 2013
  • 51. Roadmap • Next release of Phoenix • Performance testing • Increased adoption • Adding to HBase (?) 51 LA HUG – Sept 2013
  • 52. Open Source! • Main: https://github.com/forcedotcom/phoenix • Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG – Sept 2013
  • 53. (obligatory hiring slide) We’re Hiring!
  • 54. Questions? Comments? jyates@salesforce.com @jesse_yates