Brahe - Flexible Indexing At ScaleBen BrownSoftware Architect, Cerner Corporation
Who I Am• Ben BrownSoftware Architect• CernerHealthcare IT Company• Semantic SolutionsTeam of 10Search ServicesFun StuffNL...
Chart SearchTaking ThisPhoto: http://bit.ly/Y7kTJt
Chart SearchTurning it into this
Chart Search Does• Faceting• NLP• Semantic Concept MarkupMakes for a heavy record(Especially on Solr 1.4)
Where We StartedStarted Major Engineering in 2009IBM Dev Works: http://ibm.co/14ZrtqX
Where We StartedStarted Major Engineering in 2009IBM Dev Works: http://ibm.co/14ZrtqX
Scale• Clusters partitioned by client• Raw and processed data in HDFS• All processing & indexing done through mapreduce
Shard SizeLimiting Factor ~26 Million Discrete Results PerShardAverage of 35 Shards Per ClientRange 5 to 140
Query Touch Points
Query Touch PointsOne User Action ~ 4 Queries35 Shards - 432 Touch Points140 Shards - 1692 Touch Points• Works, but not ef...
Growth• Hashed ID does not play well with resizing• Deploy Again• Reindex EverythingDocument Hash modulo Shard CountDoc On...
We Have a ProblemPainful GrowthLots of DeploysVariance RiskImage: http://bit.ly/Y7oBD6
What Would Be Better?Load Balance at the ClientAutomated FailoverEasy DeploymentsSimplified SplittingMinimized Touch Point...
SolutionShift Master to HBaseImage: http://bit.ly/ZXO2na
Why HBase?Lexically organized keysEfficient key range scansEfficient time based scansWere pretty good at operating it
Coordinate With ZooKeeper|-- Index name|-- Version|-- Solr Schema/Config|-- Table Name + Connection Info|-- Shard Number|-...
Custom Core AdminWork with ZooKeeper for claim processCreates solr core after claimsControls pulling data from HBase
Claim Process
Claim Process
Claim ProcessImage: http://bit.ly/Or317R
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Coordinate With ZooKeeper|-- Index name|-- Version|-- Solr Schema/Config|-- Table Name + Connection Info|-- Shard Number|-...
Queries• Client inspects ZooKeeper• Finds online nodeso Only for the keyspace it cares abouto Issues distributed queries i...
Ends Thoughts• Keep things simple• Disconnect your stages• Keep your touchpoints at a minimum• Organize your data around y...
CONTACTBen Brownhttp://linkd.in/ZZIBK4@b_brownENGINEERING BLOGhttps://engineering.cerner.com/WE’RE HIRING!http://www.cerne...
Bonus Slides!
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Upcoming SlideShare
Loading in …5
×

Brahe mass scale flexible indexing

878 views

Published on

Presented by Ben Brown, Software Architect, Cerner Corporation

Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things.


The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index.
Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.

Published in: Education, Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
878
On SlideShare
0
From Embeds
0
Number of Embeds
141
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Brahe mass scale flexible indexing

  1. 1. Brahe - Flexible Indexing At ScaleBen BrownSoftware Architect, Cerner Corporation
  2. 2. Who I Am• Ben BrownSoftware Architect• CernerHealthcare IT Company• Semantic SolutionsTeam of 10Search ServicesFun StuffNLP, Medical Ontologies, ML
  3. 3. Chart SearchTaking ThisPhoto: http://bit.ly/Y7kTJt
  4. 4. Chart SearchTurning it into this
  5. 5. Chart Search Does• Faceting• NLP• Semantic Concept MarkupMakes for a heavy record(Especially on Solr 1.4)
  6. 6. Where We StartedStarted Major Engineering in 2009IBM Dev Works: http://ibm.co/14ZrtqX
  7. 7. Where We StartedStarted Major Engineering in 2009IBM Dev Works: http://ibm.co/14ZrtqX
  8. 8. Scale• Clusters partitioned by client• Raw and processed data in HDFS• All processing & indexing done through mapreduce
  9. 9. Shard SizeLimiting Factor ~26 Million Discrete Results PerShardAverage of 35 Shards Per ClientRange 5 to 140
  10. 10. Query Touch Points
  11. 11. Query Touch PointsOne User Action ~ 4 Queries35 Shards - 432 Touch Points140 Shards - 1692 Touch Points• Works, but not efficient• Chance for variance killing performance• Failure is a massive config headache
  12. 12. Growth• Hashed ID does not play well with resizing• Deploy Again• Reindex EverythingDocument Hash modulo Shard CountDoc One:Hash(abc123) = 15Doc Two: Hash(efg456) = 8Doc Three: Hash(hij789) = 73 ShardsDoc One -> Shard 0Doc Two -> Shard 2Doc Three -> Shard 14 ShardsDoc One -> Shard 3Doc Two -> Shard 0Doc Three -> Shard 3
  13. 13. We Have a ProblemPainful GrowthLots of DeploysVariance RiskImage: http://bit.ly/Y7oBD6
  14. 14. What Would Be Better?Load Balance at the ClientAutomated FailoverEasy DeploymentsSimplified SplittingMinimized Touch PointsDisconnected Stages
  15. 15. SolutionShift Master to HBaseImage: http://bit.ly/ZXO2na
  16. 16. Why HBase?Lexically organized keysEfficient key range scansEfficient time based scansWere pretty good at operating it
  17. 17. Coordinate With ZooKeeper|-- Index name|-- Version|-- Solr Schema/Config|-- Table Name + Connection Info|-- Shard Number|-- Shard Boundary Info|-- Replica Number|-- Ephemeral Claim|-- Solr Connection Info|-- Ephemeral Online
  18. 18. Custom Core AdminWork with ZooKeeper for claim processCreates solr core after claimsControls pulling data from HBase
  19. 19. Claim Process
  20. 20. Claim Process
  21. 21. Claim ProcessImage: http://bit.ly/Or317R
  22. 22. Claim Process
  23. 23. Claim Process
  24. 24. Claim Process
  25. 25. Claim Process
  26. 26. Claim Process
  27. 27. Claim Process
  28. 28. Claim Process
  29. 29. Claim Process
  30. 30. Claim Process
  31. 31. Claim Process
  32. 32. Claim Process
  33. 33. Claim Process
  34. 34. Claim Process
  35. 35. Claim Process
  36. 36. Claim Process
  37. 37. Claim Process
  38. 38. Claim Process
  39. 39. Claim Process
  40. 40. Claim Process
  41. 41. Claim Process
  42. 42. Claim Process
  43. 43. Claim Process
  44. 44. Claim Process
  45. 45. Claim Process
  46. 46. Claim Process
  47. 47. Claim Process
  48. 48. Claim Process
  49. 49. Claim Process
  50. 50. Coordinate With ZooKeeper|-- Index name|-- Version|-- Solr Schema/Config|-- Table Name + Connection Info|-- Shard Number|-- Shard Boundary Info|-- Replica Number|-- Ephemeral Claim|-- Solr Connection Info|-- Ephemeral Online
  51. 51. Queries• Client inspects ZooKeeper• Finds online nodeso Only for the keyspace it cares abouto Issues distributed queries if necessary• Balances in the Client• Retries if queries fail
  52. 52. Ends Thoughts• Keep things simple• Disconnect your stages• Keep your touchpoints at a minimum• Organize your data around your queries• Use what you’re good at
  53. 53. CONTACTBen Brownhttp://linkd.in/ZZIBK4@b_brownENGINEERING BLOGhttps://engineering.cerner.com/WE’RE HIRING!http://www.cerner.com/About_Cerner/Careers/
  54. 54. Bonus Slides!

×