Solr中国8月4日答疑交流v2

1,025 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,025
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
29
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Solr中国8月4日答疑交流v2

  1. 1. Solr Community of China © Copyright www.solr.cc 全国妖防组 QQ27779810 longkeyy@solr.cc Solr使用交流及问题答疑
  2. 2. Solr Community of China www.solr.cc SOLR Crawler Index Search Integration
  3. 3. Solr Community of China www.solr.cc Solr
  4. 4. Solr Community of China www.solr.cc Apache Solr 简介  solr名称来源  Search On Lucene Replication  solr基本概况  是 java 开发的一个完整的搜索服务,底层是基于lucene的,lucene 是一个搜索组件,而 solr是个完整的应用。部署solr之后,就能直接通过相关的接口向solr发送索引数据、执 行搜索。  solr历史  2004年 CNET 开发 Solr,为 CNET 提供站内搜索服务  2006年1月捐献给 Apache ,成为 Apache 的孵化项目  一年后 Solr 孵化成熟,发布了1.2版,并成为 Lucene 的子项目  2010年6月,solr 发布了的1.4.1版,这是1.4的 bugfix 版本,1.4.1的solr使用的lucene是2.9版 本的  solr 从1.4.x版本以后,为了保持和lucene同步的版本,solr直接进入3.0版本
  5. 5. Solr Community of China www.solr.cc About SolrCloud SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.
  6. 6. Solr Community of China www.solr.cc Simple two shard cluster Command java -DzkRun -DnumShards=2 -Dbootstrap_confdir=solr/conf -jar start.jar Solr.xml <core name=“core1" instanceDir=“core1" collection="collection1" shard="shard1" dataDir="data"/> <core name=“core2" instanceDir=“core2" collection="collection1" shard="shard2" dataDir="data"/> This example simply creates a cluster consisting of two solr servers representing two different shards of a collection.
  7. 7. Solr Community of China www.solr.cc Simple two shard cluster with shard replicas Command java -Djetty.port=8983 -DzkRun -DnumShards=2 -Dbootstrap_confdir=solr/conf -jar start.jar java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar Solr.xml <core name=“core1" instanceDir=“core1" collection="collection1" shard="shard1" dataDir="data"/> <core name=“core2" instanceDir=“core2" collection="collection1" shard="shard2" dataDir="data"/> Solr.xml <core name=“core3" instanceDir=“core3" collection="collection1" shard="shard1" dataDir="data"/> <core name=“core4" instanceDir=“core4" collection="collection1" shard="shard2" dataDir="data"/> •This example will simply build off of the previous example by creating another copy of shard1 and shard2. •Extra shard copies can be used for high availability and fault tolerance, or simply for increasing the query capacity of the cluster.
  8. 8. Solr Community of China www.solr.cc Solr schema
  9. 9. Solr Community of China www.solr.cc Distributed and replicated Solr schema Collection A single search index. Shard A logical section of a single collection (also called Slice). Sometimes people will talk about "Shard" in a physical sense (a manifestation of a logical shard) Replica A physical manifestation of a logical Shard, implemented as a single Lucene index on a SolrCore Leader One Replica of every Shard will be designated as a Leader to coordinate indexing for that Shard SolrCore Encapsulates a single physical index. One or more make up logical shards (or slices) which make up a collection. Node A single instance of Solr. A single Solr instance can have multiple SolrCores that can be part of any number of collections. Cluster All of the nodes you are using to host SolrCores.
  10. 10. Solr Community of China www.solr.cc Overall picture of crawling and search Apache Solr Personalize Suggest Facet Clustering Apache UIMA Apache Tika NLP Association index Collocation Meta Data Personalize Machine Learning App Server Access Log Apache ManifoldCF WWW File Server RDBMS CMS SNS / blog ActiveDirectory
  11. 11. Solr Community of China www.solr.cc Crawler
  12. 12. Solr Community of China www.solr.cc Apache Solr The data acquisition by ManifoldCF Solr Cell index Apache ManifoldCF Web RSS Windows shares File System JDBC -Oracle - SQLServer - Postgresql CMIS Alfresco LiveLink (OpenText) Documentum (EMC) Meridio (Autonomy) SharePoint (Microsoft) FileNet (IBM) Active Directory Output connector Repository Connector Authority connector Metadata Content Access token
  13. 13. Solr Community of China www.solr.cc Establish Connection Logic 1、Connect to Zookeeper 2、Response Collection URL 3、Connect to Collection CODE String zkHost = "localhost:2181"; String defaultCollection = "collection1"; CloudSolrServer server = new CloudSolrServer(zkHost); server.setDefaultCollection(defaultCollection);
  14. 14. Solr Community of China www.solr.cc Index
  15. 15. Solr Community of China www.solr.cc To architecture and coexistence RDB Front AP server RDBMS Request Response (HTML)Browser Front AP server RDBMS Solr Server INDEX Search Request Search result list (XML) Full-text search Indexing Request Response (HTML) Browser Transaction processing Transaction processing Architecture only RDB Architecture with a high degree of freedom by INDEX RDB and Solr
  16. 16. Solr Community of China www.solr.cc Add and update data logic 1. Ask for collection url? 2. Response collection url 3. Connect to collection use url and Add or Update Data 4. Reblance to shard CODE SolrInputDocument doc = new SolrInputDocument(); doc.addField(“id", "bookId"); doc.addField("name", "The Legend of Po part "); server.add(doc); server.commit();
  17. 17. Solr Community of China www.solr.cc Search
  18. 18. Solr Community of China www.solr.cc According to MCF, search that takes into account the security MCFSecurity SearchComponent plugin Apache Solr index Metadata Content Access token Apache ManifoldCF Authority connector Active Directory Search Request Returning the search results username@domain Access token
  19. 19. Solr Community of China www.solr.cc
  20. 20. Solr Community of China www.solr.cc Intergration
  21. 21. Solr Community of China www.solr.cc Integrated with other Porject Lucene Zookeeper Nutch Mahout HadoopManifoldCF Solr OpenNLPTika
  22. 22. Solr Community of China www.solr.cc Features of Solr Search by Keyword Indexing Highlights Custom rankings Geographic Search Perhaps 。。。 Facet Grouping Search Similar document search Clustering Morphological analyzer Notation sway measures Synonym search Suggestion Language identification
  23. 23. Solr Community of China www.solr.cc The eco-system through the integration of three technologies To optimization of the entire platform from the search engine alone Search Engine Web crawler Data connector Natural language processing Statistical processing Machine learning Sign up Feedback of data analysis Crawling The search engine provided to efficiently information including analysis data The analyzes such as search logs, to collect material and action plan following data for retrieval accuracy further improvement It provides results more accurate, thereby improving the search experience for the user
  24. 24. Solr Community of China © Copyright www.solr.cc Thanks Website: www.solr.cc Contact: support@solr.cc QQgroup:187670960 WeiXin: solrcn

×