Solr introduction


Published on

A basic overview about Solr

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Full-Text Search: In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references).In a full-text search, the search engine examines all of the words in every stored document as it tries to match search criteria (text specified by a user) Faceted Search: Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category. Users can then “drill down” by applying specific contstraints to the search results. Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search. Faceted search provides an effective way to allow users to refine search results, continually drilling down until the desired items are found. Example for Faceted Search: A computer selling page, normally, we have a panel to select the manufactory of computer (sony, ibm, …) the we search the appropriate product. In faceted search concept, we do an opposite thing, from a query, we show the suitable manufactory, then user can continue the searching based on current result.Ref:
  • Solr introduction

    1. 1. SOLR Introduction Lucence / SOLR1
    2. 2. SOLR Introduction  Why do we need a Search Engine ?  What is Lucene/SOLR ?  Advantages of SOLR  SOLR Architecture  Query Syntax  Working with SOLR: Feed data, query data  SOLR installation  SOLR configuration2
    3. 3. Why do we need a Search Engine ? Google, Bing, Yahoo, … Database Yes, that’s normal way Can not access to our data The problem is response time Need a Search Engine: Lucene / SOLR3
    4. 4. What is Lucene/SOLR ? Lucene  Apache Lucene is a free/open source information retrieval software library.  Lucene is just an indexing and search library  Lucene supports: Java, Delphi, Perl, C#, C++, Python, Ruby, and PHP4
    5. 5. What is Lucene/SOLR ? Solr  Solr is wrapper of Lucene for Java  Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat  Solr is a REST service5
    6. 6. SOLR Introduction Advantages of SOLR  Open source/free  Administration Interface  Rich Document Parsing and Indexing (PDF, Word, HTML, etc)  Full-Text Search  Faceted Search and Filtering  Multi Server support The comparison of Search Engines: overview/6
    7. 7. SOLR architecture7
    8. 8. SOLR Shard8
    9. 9. Query Syntax Keyword matching title:foo - Search for word "foo" in the title field. title:"foo bar” - Search for phrase "foo bar" in the title field. -title:bar - Search everything, except "bar" in the title field.9
    10. 10. Query Syntax Wildcard matching title:foo* - Search for any word that starts with "foo" in the title field. title:foo*bar - Search for any word that starts with "foo" and ends with bar in the title field. *:* - Search every thing10
    11. 11. Query Syntax Proximity matching "foo bar"~number Number = 0, exactly match Number = 1, The result may be “bar foo”11
    12. 12. Query Syntax Range searches field:[a TO z] - Search the field has value in range [a->z] field:[* TO 100] - Search all values less than or equal to 100 field:[100 TO *] - Search all values greater than or equal to 100 field:[* TO *] - Matches all documents with the field12
    13. 13. Query Syntax Nested query _query_:”field:*lap” OR _query_:”field:*tran” _query_:”{!dismax qf=somefield} cat dog”13
    14. 14. Query Syntax Join {!join from=inner-id to=outer-id}zzz:vvv SQL SELECT xxx, yyy FROM collection1 WHERE outer-id IN ( SELECT inner-id FROM collection1 where zzz = "vvv")14
    15. 15. Query Syntax Faceted Search q=inStock:true&facet=true&facet.field=cat&facet.limit=5 <response> <responseHeader><status>0</status><QTime>4</QTime></responseHeader> <result numFound="12" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="cat"> <int name="electronics">10</int> <int name="memory">3</int> <int name="drive">2</int> <int name="hard">2</int> <int name="monitor">2</int> </lst> </lst> </lst> </response>15
    16. 16. SolrJ Feed data // make a connection to Solr server SolrServer server = new HttpSolrServer("http://localhost:8080/solr/"); // prepare a doc final SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField("id", 1); doc1.addField("firstName", "First Name"); doc1.addField("lastName", "Last Name"); final Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); docs.add(doc1); // add docs to Solr server.add(docs); server.commit();16
    17. 17. SolrJ Query data final SolrQuery query = new SolrQuery(); query.setQuery("*:*"); query.addSortField("firstName", SolrQuery.ORDER.asc); final QueryResponse rsp = server.query(query); final SolrDocumentList solrDocumentList = rsp.getResults(); for (final SolrDocument doc : solrDocumentList) { final String firstName = (String) doc.getFieldValue("firstName"); final String id = (String) doc.getFieldValue("id"); }17
    18. 18. SOLR Introduction SOLR installation Ref: 1/tutorial.html 2_ Prerequisite: Tomcat (7) JDK 1.6 SOLR 4.2.1
    19. 19. SOLR Introduction• Extract to (D:Projectsolr_websolr-4.2.1)• Copy resourcesolr-4.2.1examplessolr to D:Projectsolr_websolr = SOLR_HOME• Copy resourcesolr-4.2.1distsolr-4.2.1.war to SOLR_HOME and rename to solr.war• Open the SOLR_HOMEcollection1confsolrconfig.xml and modify the <dataDir> <dataDir>${}</dataDir>• Create a Tomcat Context (solr.xml) file like this:<?xml version="1.0" encoding="utf-8"?> <Context docBase="D:/Project/solr_web/solr/solr.war" debug="0“crossContext="true"> <Environment name="solr/home" type="java.lang.String"value="D:/Project/solr_web/solr" override="true"/></Context>• Copy this file (solr.xml) to tomcat.7.0.35confCatalinalocalhost• Start Tomcat• Open the SOLR dashboard with address: http://localhost:8080/sorl/#/19
    20. 20. SOLR Introduction SOLR Configuration Ref: In the configuration of a Solr server, we need at least 2 xml files: solrconfig.xml and schema.xml Solrconfig.xml: contains the common configuration of a Core: size of memory, data path, transaction, … Schema.xml: contains the definitions of data: structure, data type, fields name …20
    21. 21. SOLR Introduction SOLR Configuration Schema.xml field : a field will be indexed by solr <field name="firstName" type="string" indexed="true" stored="true"/> dynamicField: like a field but the name is not specified yet <dynamicField name="*_i" type="int" indexed="true" stored="true"/> name="*_i" will match any field ending in _i (like myid_i, z_i)21