realtime-twitter-search

4,315 views

Published on

蘑菇街实时搜索引擎

Published in: Technology
6 Comments
16 Likes
Statistics
Notes
  • BUSINESS PROPOSAL///

    Dear Jessica,

    I wrote you earlier but you did not respond to my mail. Like I wrote you earlier, I am Attorney Matiko Timbo, a lawyer by profession, based in Lome - Togo. In my first mail I mentioned to you about a late client of mine (a native of your country) who died with his wife and only child in an accident On the 21st of April, 2005, without any disclosed relative. I need your urgent assistance claiming the sum of U.S. $ 12,500,000.00 million belonging to my late client, which is deposited in a bank Togo. He was my client and a major supplier contractor for big oil companies.
    I am contacting you due to the similarity of your surname with my late client. Therefore, I am calling your attention because that the bank asked me to bring his closest kin.
    Indicate your interest by writing back directly to my e-mail: (matiko.timbo@rocketmail.com) for more details.

    Regards.

    Barrister Matiko Timbo (ESQ.)
    matiko.timbo@rocketmail.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • what is the file extension name? I can not open the downloaded file.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • what is the file extension name?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Best one
    Hope you are in good health. My name is AMANDA . I am a single girl, Am looking for reliable and honest person. please have a little time for me. Please reach me back amanda_n14144@yahoo.com so that i can explain all about myself .
    Best regards AMANDA.
    amanda_n14144@yahoo.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • no mac
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
4,315
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
236
Comments
6
Likes
16
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • realtime-twitter-search

    1. 1. 微博实时搜索xuanxi@mogujie.com 汪亚军
    2. 2. 大纲简介近实时搜索可增量段存储 AND 可中断查询并发编程Jessica 2.0
    3. 3. 简介于我菇街搜索实时搜索场景
    4. 4. Lucene segment segment 1 segment 2 segment 3 Map mergepolicyReduce segment
    5. 5. Lucene 近实时搜索IndexWriter.setRAMBufferSizeMB() 设置rambuffer大小 LUCENE-843IndexWriter.getReader() 将ram buffer刷回RamDir 但不Commit LUCENE-1516LUCENE-843 VS LUCENE-1516
    6. 6. Zoie由linkedin 发并 源提供一个在lucene之上的封装保证实时性快速doci<=>uid双向对应方案
    7. 7. zoie Mem A + Disk Mem A + new Disk1,Mem A 添加文档立刻更新的 1,Mem A 添加文档立刻更新的2,Disk在下次合并之前,一直使用 2, 使用前面Mem A和Disk合并好 的 new Disk Mem A +Mem B+ Disk 1,Mem B 添加文档立刻更新的 2,Mem A 只读 与Disk合并 2,Disk 无变化 与Mem A合并
    8. 8. ZoieRAM IndexWriter 定时reopen有效 10sreopen一次 Disk缓存中docId<=>Object_id 对应不对 都必须重建一次fav:{ 100 to 1000} 这是个灾难查询千万级 的数据 耗时在100ms级 (nocache)
    9. 9. 结论为了实时搜索,常常刷新创建的segment,不可取实时系统,缓存常常失效,这样我们没有办法在短时间内,遍历所有的docId
    10. 10. 解决办法可增量的倒排索引从新向旧遍历的倒排索引可中断的查询
    11. 11. 可增量倒排索引term 字典词频textPostingList
    12. 12. 可增量倒排索引 textpointe frequency startPostingsPotine endPostingsPotinera r rbc parralled int arrays...... Termx 字典yz
    13. 13. 可增量倒排索引 textpointe frequency startPostingsPotiner endPostingsPotinera rbcd......xyz PostingListStore
    14. 14. 可增量倒排索引 textpointe frequency startPostingsPotiner endPostingsPotinera rbcd......x 从新向旧遍历的倒排索引yz PostingListStore
    15. 15. Lucene 倒排压缩算法
    16. 16. 可增量倒排索引 textpointe frequency startPostingsPotiner endPostingsPotinera rb lucenec 的压缩算法无法从新向旧搜索d 文档......x 从新向旧遍历的倒排索引yz PostingListStore
    17. 17. PostingListStoredocIdposition (twitter 1-140 < 255 )
    18. 18. PostingListStoredocIdposition (twitter 1-140 < 255 )(docId <<8 ) | (position&0xFF)
    19. 19. PostingListStore无压缩通过位置信息和docId存储在一个int里面 少内存 销
    20. 20. PostingListStoreint poolauto expandzero copy (do not like array)
    21. 21. PostingListStore
    22. 22. PostingListStore PostingListPointer
    23. 23. PostingListStore单个Term的PostingList的所有Slice组成一个双向链表 prepointer data ... ... data nextpointer
    24. 24. PostingListStore单个Term的PostingList的所有Slice组成一个双向链表 prepointer data ... ... data nextpointer 双向链表的好处?
    25. 25. 可增量倒排索引
    26. 26. 可中断查询我们最 心最近有更新的前n条数据从新向老查询超时立马返回(HashedWheelTimer)
    27. 27. 智能二分查找算法依次从新向旧找记录上次二分查找时候各位置数据对当前查找做智能直到skipList search + smart search + binarysearch
    28. 28. 总结中断查询 + 从新查找PostingStore = 实时搜索
    29. 29. 并发编程writer和reader wait free(JMM)保证reader的数据一致性(版本号)
    30. 30. Writer AND Readerv = 100;thread a write v = 101;thread b read v => 100;数据一致性 java并不能保证a写了就立马能被b看到Lock Free
    31. 31. Java Memory Modelhappen beforevolatilememory barriervolatile 只能保证之前写的越过内存障碍*
    32. 32. 内存屏障PostingStore越过内存屏障 => 从Term的PostingList 始指针向新查找
    33. 33. 内存屏障单个Term的PostingList的所有Slice组成一个双向链表 prepointer data ... ... data nextpointer 双向链表的好处
    34. 34. reader 数据一致性
    35. 35. 版本号
    36. 36. Jessica 2.0重写lucene纯内存posting store从日志系统重建索引可中断查询存储原始数据
    37. 37. Jessica 2.0jessica 负责索引 jessica monica负责打分排序 feeder monica jessicafeeder负责分词 jessica
    38. 38. 感谢twitterMichael Busch

    ×