Your SlideShare is downloading. ×
0
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Yokozuna, Distributed Search You Don't Think About
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Yokozuna, Distributed Search You Don't Think About

616

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
616
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. YokozunaDistributed SearchYou Don’t Think AboutRyan Zezeski May 14th 20131Tuesday, May 21, 13
  • 2. Live Demo2Tuesday, May 21, 13
  • 3. Live Demos3Tuesday, May 21, 13
  • 4. PROBLEM?4Tuesday, May 21, 13
  • 5. PROBLEM?SOLUTION!5Tuesday, May 21, 13
  • 6. Solution Pre-made6Tuesday, May 21, 13
  • 7. Piece At A Time7Tuesday, May 21, 13
  • 8. Goals• Don’t screw up• Show howYokozunadoesn’t make you think(too hard)• Teach you about Search• Neat things you can dowithYokozuna8Tuesday, May 21, 13
  • 9. PROBLEM: SEARCHFOR COMMITSABOUT SPECIFICFEATURE/BUGMAKE IT GOOGLE-LIKE9Tuesday, May 21, 13
  • 10. SOLUTION: INDEXCOMMITS INYOKOZUNA -“COMMIT LOGSEARCHER” (CLS)10Tuesday, May 21, 13
  • 11. Anatomy of a CommitMsg11Tuesday, May 21, 13
  • 12. Primary Key12Tuesday, May 21, 13
  • 13. Any Node Will Do13Tuesday, May 21, 13
  • 14. Term Query14Tuesday, May 21, 13
  • 15. Query Any Node15Tuesday, May 21, 13
  • 16. Boolean (1)repo:riak_kv repo:riak_core16Tuesday, May 21, 13
  • 17. Boolean (2)repo:riak_kv AND author:”Ryan Zezeski”17Tuesday, May 21, 13
  • 18. Boolean (3)commit_author:"Ryan Zezeski" ORcommit_author:"Joseph Blomstedt" NOTcommit_repo:riak_kv18Tuesday, May 21, 13
  • 19. Range (1)commit_repo:riak_coreAND commit_dt:[NOW-1YEAR TO NOW]19Tuesday, May 21, 13
  • 20. Range (2)commit_repo:riak_coreAND commit_dt:[NOW-1YEAR TO NOW]I RAN THIS ON2013-05-10sort=dt asc20Tuesday, May 21, 13
  • 21. Wildcard (1)*:*GET TOTALCOUNT FIRST21Tuesday, May 21, 13
  • 22. Wildcard (2)commit_repo:riak_*NOTICE COUNTIS LOWER22Tuesday, May 21, 13
  • 23. WHAT ABOUTSEARCHINGSUMMARY ANDBODY?23Tuesday, May 21, 13
  • 24. THE INVERTEDINDEX24Tuesday, May 21, 13
  • 25. AN INDEX - BUTINVERTED25Tuesday, May 21, 13
  • 26. EVERYONE KNOWSWHAT IT IS26Tuesday, May 21, 13
  • 27. EVEN NON-TECHPEOPLE27Tuesday, May 21, 13
  • 28. YES...EVENYOURPARENTS28Tuesday, May 21, 13
  • 29. What’s In A Book?29Tuesday, May 21, 13
  • 30. • WORDS• PARAGRAPHS• SECTIONS• CHAPTERS• ETC.30Tuesday, May 21, 13
  • 31. AND PAGENUMBERS31Tuesday, May 21, 13
  • 32. 32Tuesday, May 21, 13
  • 33. PAGE NUMBERS AREAN IMPLICIT INDEX33Tuesday, May 21, 13
  • 34. PAGE NUMBER TOWORDSWORD TO PAGENUMBERSINVERTED34Tuesday, May 21, 13
  • 35. STOLEN FROM BLOG OF RICKY HO: http://horicky.blogspot.com/2013/02/text-processing-part-2-inverted-index.html35Tuesday, May 21, 13
  • 36. HOW DOYOU GETTHE WORDS IN THEFIRST PLACE?36Tuesday, May 21, 13
  • 37. Analysis - The IcebergThat Sunk The Titanic37Tuesday, May 21, 13
  • 38. Phrase (1)subject:hinted OR subject:handoffOR body:hinted OR body:handoff38Tuesday, May 21, 13
  • 39. Phrase (2)subject:”hinted handoff”OR body:”hinted handoff”39Tuesday, May 21, 13
  • 40. Phrase (3)subject:”partition vnode”OR body:”partition vnode”40Tuesday, May 21, 13
  • 41. Phrase (4)subject:”partition vnode”~4OR body:”partition vnode”~441Tuesday, May 21, 13
  • 42. Exact Termsubject:behavior OR body:behavior42Tuesday, May 21, 13
  • 43. Fuzzy Termsubject:behavior~1 OR body:behavior~143Tuesday, May 21, 13
  • 44. RankingADD SCORE TOFLSCORE ADDEDTO EVERYRESULT44Tuesday, May 21, 13
  • 45. RECALL, PRECISION,AND RELEVANCY, OHMY!45Tuesday, May 21, 13
  • 46. RELEVANCY - FOR AGIVEN QUERY &DOC SET THERE ISAN IDEAL ANSWEROF ONLY RELEVANTDOCS46Tuesday, May 21, 13
  • 47. RECALL = WHAT %OF IDEAL ANSWERSET WAS RETRIEVED47Tuesday, May 21, 13
  • 48. PRECISION = WHAT% OF ANSWER ISRELEVANT48Tuesday, May 21, 13
  • 49. RECALLvs.PRECISIONASYOU INCREASERECALLYOUDEGRADE PRECISION49Tuesday, May 21, 13
  • 50. SOLR DETERMINESRELEVANCYVIA THENOTION OFSIMILARITY50Tuesday, May 21, 13
  • 51. SOLR USES TF-IDF:TERM FREQUENCY,INVERSE DOCUMENTFREQUENCY51Tuesday, May 21, 13
  • 52. Dismax + Facets +HighlightingFACETSHIGHLIGHTINGDISMAX52Tuesday, May 21, 13
  • 53. FACET - ATAXONOMY OFYOUR QUERY BASEDON FIELD’SVALUES53Tuesday, May 21, 13
  • 54. FACETS ALLOW“DRILL DOWN” -THEY GUIDE THEUSER54Tuesday, May 21, 13
  • 55. HIGHLIGHTINGGIVESYOUR RESULTSCONTEXT - ALLOWSQUICKERDETERMINATION OFRELEVANCY55Tuesday, May 21, 13
  • 56. DISMAX - DISjunctionMAX - A QUERYHANDLER MEANTFOR DIRECT USERINPUT56Tuesday, May 21, 13
  • 57. All Nodes Up57Tuesday, May 21, 13
  • 58. All Nodes Up - Query58Tuesday, May 21, 13
  • 59. Node 4 Down59Tuesday, May 21, 13
  • 60. Node 4 Down - Query60Tuesday, May 21, 13
  • 61. Node 3 & 4 Down61Tuesday, May 21, 13
  • 62. Node 3 & 4 Down -Query62Tuesday, May 21, 13
  • 63. REPLICATIONPROVIDES HIGHAVAILABILITY2 3 41START WITH 4 NODES63Tuesday, May 21, 13
  • 64. Write 3 Replicas2 3 4164Tuesday, May 21, 13
  • 65. Take 2 Nodes Down2 3 411 REPLICA STILLAVAILABLE65Tuesday, May 21, 13
  • 66. WHAT IF DATA ISWRITTEN WHILENODES ARE DOWN?66Tuesday, May 21, 13
  • 67. YZ Not StoredYet67Tuesday, May 21, 13
  • 68. StoreYZ Log68Tuesday, May 21, 13
  • 69. QueryYZ - Node 1& 269Tuesday, May 21, 13
  • 70. Set XFer Limit To 070Tuesday, May 21, 13
  • 71. Start Nodes 3 & 471Tuesday, May 21, 13
  • 72. Query Solr DirectWHEN MAKING THISDEMO I WASEXPECTING THIS TO BE0 BUT I FORGOT ABOUTAAE WHICH STARTEDKICKING IN BEFOREHANDOFF - SELFHEALING FTW!72Tuesday, May 21, 13
  • 73. Set Xfer Limit To 6473Tuesday, May 21, 13
  • 74. Handoff Occurs74Tuesday, May 21, 13
  • 75. 0 Pending Xfers75Tuesday, May 21, 13
  • 76. Solr Direct (Again)NOTICE IT’S NOW 301,UP FROM 54, MOREPROOF THAT HANDOFFOCCURRED - NOTETHIS QUERY IS GOINGDIRECT TO ONLY 1SHARD76Tuesday, May 21, 13
  • 77. Query Node 4YZNOW HITYOKOZUNAON NODE4 (NOTICECHANGE IN PORT #) -THIS WILL RUN A DISTSEARCH AND THUSRETURN CORRECTCOUNT77Tuesday, May 21, 13
  • 78. Data OwnershipAVNODETHE RING78Tuesday, May 21, 13
  • 79. Node DownXXXXXXXXXX79Tuesday, May 21, 13
  • 80. Write FallbackXXXXXXXXXX80Tuesday, May 21, 13
  • 81. Node UpHINTEDHANDOFF WILLMOVE REPLICATO PRIMARY81Tuesday, May 21, 13
  • 82. WHAT IFYOU RM -RFTHE INDEX DIR?82Tuesday, May 21, 13
  • 83. Kill The DataRM -RF THE INDEX DIRECTORYKILL THE SOLR PROC83Tuesday, May 21, 13
  • 84. Auto RestartYOKOZUNA NOTICES SOLR DIEDAND AUTOMATICALLY RESTARTSIT84Tuesday, May 21, 13
  • 85. Node 4 - 0 Results85Tuesday, May 21, 13
  • 86. AAE Notices MissingData86Tuesday, May 21, 13
  • 87. Node 4 - 13 ResultsDATA IS RE-INDEXED OVER TIME87Tuesday, May 21, 13
  • 88. More AAE Repair88Tuesday, May 21, 13
  • 89. Node 4 - 128 ResultsMORE INDEXES ARE REPAIRED,THIS CONTINUES UNTIL AAEREPAIRS ALL INDEXES89Tuesday, May 21, 13
  • 90. WHAT EVEN ISACTIVE ANTI-ENTROPY?90Tuesday, May 21, 13
  • 91. Mo Systems Mo Failure• index update could getlost• files can becometruncated/corrupted• accidental `rm -rf`• segfault at right time• etc...91Tuesday, May 21, 13
  • 92. MYRAID OF FAILURESCENARIOS - FROMOBVIOUS TO NEARLYINVISIBLE92Tuesday, May 21, 13
  • 93. ENTROPY IS DAMAGEAAE IS SELF HEALINGSTRIKER!!!! EHEM, IMEAN, ENTROPY!!!!93Tuesday, May 21, 13
  • 94. REPAIR EFFICIENTLY -NOT STUPIDLY94Tuesday, May 21, 13
  • 95. LearnYou Some MerkleFor A Great GoodBIG UPS TO @jtuple FOR THE AAE DIAGRAMS95Tuesday, May 21, 13
  • 96. SegmentsEACH SEGMENT IS LIST OFKEY-HASH PAIRS96Tuesday, May 21, 13
  • 97. Segment HashesHASH OF HASHES INSEGMENT97Tuesday, May 21, 13
  • 98. Hash O’ Hashes98Tuesday, May 21, 13
  • 99. WHAT HAPPENSDURINGEXCHANGE?99Tuesday, May 21, 13
  • 100. Start With 2 Trees100Tuesday, May 21, 13
  • 101. Compare Top HashesTOP HASHES DON’TMATCH -SOMETHING ISDIFFERENT101Tuesday, May 21, 13
  • 102. Compare Child HashesNARROW DOWNTHE DIVERGENTSEGMENT102Tuesday, May 21, 13
  • 103. RecurNARROW DOWNTHE DIVERGENTSEGMENT CONT...103Tuesday, May 21, 13
  • 104. Iter Key-Hash PairsITER FINAL LIST OFHASHES TO FINDDIVERGENT KEYS104Tuesday, May 21, 13
  • 105. Repair Divergent KeysREPAIR (RE-INDEX)KEYS THAT AREDIVERGENT (RED)105Tuesday, May 21, 13
  • 106. CODE FORDETECTION ANDREPAIR - NOTPREVENTION106Tuesday, May 21, 13
  • 107. WHAT HAPPENS IF 3NODES GO DOWN?107Tuesday, May 21, 13
  • 108. Stop 3 Nodes108Tuesday, May 21, 13
  • 109. Query109Tuesday, May 21, 13
  • 110. CONSISTENCYvs.AVAILABILITY110Tuesday, May 21, 13
  • 111. Uptime - Story of 9sUPTIME = (MTBF - MTTR) / MTBF111Tuesday, May 21, 13
  • 112. Uptime is FlawedIF THE SYSTEM ISDOWN, BUT NOONE MAKES AREQUEST, IS ITREALLY DOWN?112Tuesday, May 21, 13
  • 113. Yield - Uptime of thePeopleYIELD = QUERIES COMPLETED / QUERIESOFFERED113Tuesday, May 21, 13
  • 114. Harvest vs.YieldHARVEST = DATA AVAIL / COMPLETE DATAIF FACE OF FAILUREYOU CAN’T HAVE BOTHFOR A SINGLE REQUEST114Tuesday, May 21, 13
  • 115. IN TIMES OFTROUBLE -YOKOZUNACHOOSES HARVESTFOR QUERIES115Tuesday, May 21, 13
  • 116. TECHNICALLY -YOKOZUNA ISALWAYS < 100%HARVEST IN A NON-QUIESCENT CLUSTER116Tuesday, May 21, 13
  • 117. YOKOZUNA FAVORSYIELD FOR WRITES117Tuesday, May 21, 13
  • 118. ONCE RIAK 1.4 SHIPS-YOKOZUNA LANDSIN MASTER118Tuesday, May 21, 13
  • 119. THANKYOUHTTP://GITHUB.COM/BASHO/YOKOZUNA119Tuesday, May 21, 13

×