Your SlideShare is downloading. ×
0
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Phoenix Secondary Indexing - LA HUG Sept 9th, 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

1,744

Published on

In-depth look at secondary indexing for phoenix

In-depth look at secondary indexing for phoenix

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,744
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Ok, not this elephant…
  • e.g. stats, historical data
  • Actual implementation coming in example blog post
  • Actual implementation coming in example blog post
  • And don’t forget to cleanup the old row state!
  • 8pt font, <200 lines, including comments
  • Transcript

    • 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
    • 2. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 2 LA HUG – Sept 2013 https://www.madison.k12.wi.us/calendars
    • 3. About me • Developer at Salesforce – System of Record, Phoenix • Open Source – Phoenix – HBase – Accumulo 3 LA HUG – Sept 2013
    • 4. Phoenix • Open Source – https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase – Everyone knows SQL! • JDBC Driver – Plug-and-play • Faster than HBase – in some cases 4 LA HUG – Sept 2013
    • 5. Why Index? • HBase is only sorted on 1 “axis” • Great for search via a single pattern Example! LA HUG – Sept 20135
    • 6. Example name: type: subtype: date: major: minor: quantity: LA HUG – Sept 20136
    • 7. Secondary Indexes • Sort on ‘orthogonal’ axis • Save full-table scan • Expected database feature • Hard in HBase b/c of ACID considerations LA HUG – Sept 20137
    • 8. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 8 LA HUG – Sept 2013
    • 9. 9 LA HUG – Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
    • 10. Other (Major) Indexing Frameworks • HBase SEP – Side-Effects Processor – Replication-based – https://github.com/NGDATA/hbase-sep • Huawei – Server-local indexes – Buddy regions – https://github.com/Huawei-Hadoop/hindex 10 LA HUG – Sept 2013
    • 11. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 11 LA HUG – Sept 2013
    • 12. Immutable Indexes • Immutable Rows • Much easier to implement • Client-managed • Bulk-loadable 12 LA HUG – Sept 2013
    • 13. Bulk Loading phoenix-hbase.blogspot.com 13 LA HUG – Sept 2013
    • 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG – Sept 2013 HFile Output Format
    • 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG – Sept 2013
    • 16. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 16 LA HUG – Sept 2013
    • 17. The “fun” stuff… 17 LA HUG – Sept 2013
    • 18. 1.5 years 18 LA HUG – Sept 2013
    • 19. Mutable Indexes • Global Index • Change row state – Common use-case – “expected” implementation • Covered Columns 19 LA HUG – Sept 2013
    • 20. Usage • Just SQL! • Baby name popularity • Mock demo 20 LA HUG – Sept 2013
    • 21. Usage • Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; • Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; • Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201321
    • 22. Usage • Update rows due to census inaccuracy – Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 201322
    • 23. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes in Phoenix • Mutable Indexing Internals • Roadmap 23 LA HUG – Sept 2013
    • 24. Internals • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” 24 LA HUG – Sept 2013
    • 25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 201325
    • 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
    • 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
    • 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG – Sept 2013
    • 29. Index Management 29 • Lives within a RegionCoprocesorObserver • Access to the local HRegion • Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String>getIndexUpdate(Put put); public Map<Mutation, String>getIndexUpdate(Deletedelete); } LA HUG – Sept 2013
    • 30. Why not write my own? • Managing Cleanup – Efficient point-in-time correctness – Performance tricks • Abstract access to HRegion – Minimal network hops • Sorting correctness – Phoenix typing ensures correct index sorting LA HUG – Sept 201330
    • 31. Example: Managing Cleanup • Updates can arrive out of order – Client-managed timestamps LA HUG – Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
    • 32. Example: Managing Cleanup Index Table LA HUG – Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
    • 33. Example: Managing Cleanup LA HUG – Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4
    • 34. Example: Managing Cleanup LA HUG – Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3
    • 35. Example: Managing Cleanup LA HUG – Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
    • 36. Example: Managing Cleanup LA HUG – Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13
    • 37. Managing Cleanup • History “roll up” • Out-of-order Updates • Point-in-time correctness • Multiple Timestamps per Mutation • Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 201337
    • 38. Phoenix Index Builder • Much simpler than full index management • Hides cleanup considerations • Abstracted access to local state LA HUG – Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate>getIndexDeletes(TableState state; public Iterable<IndexUpdate>getIndexUpserts(TableState state); }
    • 39. Phoenix Index Codec LA HUG – Sept 201339
    • 40. Dude, where’s my data? 40 LA HUG – Sept 2013 Ensuring Correctness
    • 41. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows 41 LA HUG – Sept 2013
    • 42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl 42 LA HUG – Sept 2013
    • 43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right 43 LA HUG – Sept 2013
    • 44. Failure Recovery • Custom WALEditCodec – Encodes index updates – Supports compressed WAL • Custom WAL Reader – Replay index updates from WAL LA HUG – Sept 201344 <property> <name>hbase.regionserver.wal.codec</name><value>o.a.h.hbase.regionserver.w al.IndexedWALEditCodec</value> </property> <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> </property>
    • 45. Failure Situations • Any time before WAL, client replay • Any time after WAL, HBase replay • All-or-nothing LA HUG – Sept 201345
    • 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
    • 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG – Sept 2013
    • 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013
    • 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG – Sept 2013
    • 50. Agenda • About • Other Indexing Frameworks • Immutable Indexes • Mutable Indexes • Roadmap 50 LA HUG – Sept 2013
    • 51. Roadmap • Next release of Phoenix • Performance testing • Increased adoption • Adding to HBase (?) 51 LA HUG – Sept 2013
    • 52. Open Source! • Main: https://github.com/forcedotcom/phoenix • Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG – Sept 2013
    • 53. (obligatory hiring slide) We’re Hiring!
    • 54. Questions? Comments? jyates@salesforce.com @jesse_yates

    ×