Solr

1,414 views
1,308 views

Published on

Code & Beer
Topic: Apache Solr

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,414
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Solr

  1. 1. by NNNN (周世恩) Code & Coffee 2013/11/1
  2. 2. What is Solr?
  3. 3. What is
  4. 4. • Full-featured text search • High performance • index size: 20-30% the size of text data. • small RAM requirements(~1MB) • Powerful, Accurate and Efficient Search Algorithms • 100% in Java(^^)
  5. 5. Lucene(cont.) • Multiple Analyzer / Tokenizer • Fields Searching • Merge results • Flexible faceting, highlighting, joins and result grouping • Typo-tolerant suggesters(當然要⾃自⼰己建⽴立) • Customize ranking model..(VSM, BM25)
  6. 6. Lucene(Query) http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/Query.html
  7. 7. http://www.ibm.com/developerworks/cn/java/j-lo-lucene1/fig001.jpg
  8. 8. Where is index file stored? • Memory • File System • HDFS • FileSystem config 設定為 HDFS
  9. 9. #Note • 只有被index的field 才可以search • 可以純store 不index • ⽀支援多種Type(Long, Int, String, Text...) • Indexing 就要決定好Tokenizer(Analyzer) 了 • ⽀支援同時searching and indexing?
  10. 10. #Note • 使⽤用前搖⼀一搖 • ⼀一開始就要清楚有哪些Field • 降低重建index的機會(RDB只要打個指令 就好)
  11. 11. Lucene Index file項⺫⽬目很多, 少⼀一個你就GG
  12. 12. What is Solr?
  13. 13. 超屌 企業級 免費 的
  14. 14. Search Platform
  15. 15. Lucene 功能該有的 都有了
  16. 16. Solr 還多了.... • 漂亮的Admin Interface! • REST-like API(易與其他App結合) • Dynamic clustering • Database integration • Geospatial search(Google Map?) • 調整Cache Size
  17. 17. 還記得雲端的優勢... • Highly reliable • Scalable • Fault tolerant • Distributed indexing • Replication • Load-balanced • Automated failover and recovery(?)
  18. 18. 常⽤用的config • schema.xml(定義每個field) • solrconfig.xml (定義每個handler的URI) • jetty.xml(!) • solr.xml(定義core的數量)
  19. 19. Real-time indexing?
  20. 20. Near Real-time indexing • Documents are available for search almost immediately after being indexed... • 也要有commit 才算數(....) https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
  21. 21. Searching • Query “id: {id} AND name:{Name} OR title:{text}” • Highlighting • Projection • Sorting(asc, desc) • Output format: JSON, CSV, XML • Others: spellcheck, Wildcard Query, +-*/
  22. 22. Sample Output
  23. 23. Import Data From DB • 在solrconfig.xml 修改 http://wiki.apache.org/solr/DataImportHandler
  24. 24. The diff between Solr and RDB • Solr is for indexed text or lots of unstructured docs. • Solr is optimized for searching, not for storage and retrieval of individual records. http://stackoverflow.com/questions/5814050/solr-or-database
  25. 25. Distributed Search cluster • 很多台機器架設 Solr, 選⼀一台來進⾏行聯結 • 需要在config設定嗎?
  26. 26. Distributed Solr Cluster & Load balancer http://wiki.apache.org/solr/SolrReplication
  27. 27. http://wiki.apache.org/solr/SolrReplication
  28. 28. #Note • 你可以先⽤用包有lucene indexing 功能的 java application 先製作好index directory再 給solr ⽤用 • 如果solr要進⾏行update時, 最好先確認沒有 其他application正在進⾏行寫⼊入的程序, 否則 GG • indexing 時, 不管是solr還是lucene, writelock不要亂刪
  29. 29. Live Demo 眾神們曾經說過這很危險的
  30. 30. 下載Solr 最新版 $: cd solr-4.4.0/example $: java -Xmx2048m -jar start.jar
  31. 31. The End
  32. 32. 好Tool 分享 • Luke(檢查index⽤用) • Apache Tika • Apache hadoop • Apache Tomcat
  33. 33. BBQ(Bonus) • Customize tokenizer • Document Boosting • Field Boosting • Field aliasing / renaming

×