Alfonso FocaretaAngelo QuercioliCreare il proprio motore di ricerca con Apache Solralfonso.focareta@pronetics.it (@afocare...
Solr & LuceneAlfonso Focareta   alfonso.focareta@pronetics.itAngelo Quercioli   Angelo.quercioli@pronetics.it
Lucene: features                 Alfonso Focareta                           alfonso.focareta@pronetics.it                 ...
Solr: features                  Alfonso Focareta                    alfonso.focareta@pronetics.it                  Angelo ...
Solr: licenseAlfonso Focareta                    alfonso.focareta@pronetics.itAngelo Quercioli                    angelo.q...
Solr: ArchitectureAlfonso Focareta     alfonso.focareta@pronetics.itAngelo Quercioli     angelo.quercioli@pronetics.it
Solr: Installing and Starting               Alfonso Focareta                     alfonso.focareta@pronetics.it            ...
Solr: Define a schema.xml                 Alfonso Focareta                         alfonso.focareta@pronetics.it          ...
Solr: Define a schema.xml (type definition)             Alfonso Focareta                        alfonso.focareta@pronetics...
Solr: Define a schema.xml (type definition- example)Alfonso Focareta                               alfonso.focareta@pronet...
Solr: Define a schema.xml (type definition- example)                     Alfonso Focareta                              alf...
Solr: Define a schema.xml (Copy Field- example)                    Alfonso Focareta                          alfonso.focar...
Solr: Indexing Method              Alfonso Focareta                      alfonso.focareta@pronetics.it              Angelo...
Solr: Indexing (Java Api)               Alfonso Focareta                 alfonso.focareta@pronetics.it               Angel...
Solr: Indexing (Solrj)                Alfonso Focareta                      alfonso.focareta@pronetics.it                A...
Solr: Indexing (Solrj) ExampleAlfonso Focareta                 alfonso.focareta@pronetics.itAngelo Quercioli              ...
Solr: Delete Document                Alfonso Focareta              alfonso.focareta@pronetics.it                Angelo Que...
Solr: Commit and Optimize              Alfonso Focareta                 alfonso.focareta@pronetics.it              Angelo ...
Solr: Searching                Alfonso Focareta                    alfonso.focareta@pronetics.it                Angelo Que...
Solr: Searching (Response Format)             Alfonso Focareta                    alfonso.focareta@pronetics.it           ...
Solr: Searching – Query Syntax                  Alfonso Focareta                  alfonso.focareta@pronetics.it           ...
Solr: Searching – Query Syntax 2          Alfonso Focareta                   alfonso.focareta@pronetics.it          Angelo...
Solr: Function Query                Alfonso Focareta                 alfonso.focareta@pronetics.it                Angelo Q...
Solr: Boosted Query             Alfonso Focareta               alfonso.focareta@pronetics.it             Angelo Quercioli ...
Solr: Facet Query                    Alfonso Focareta                      alfonso.focareta@pronetics.it                  ...
Solr: Filter Query                  Alfonso Focareta                    alfonso.focareta@pronetics.it                  Ang...
Demo!Alfonso Focareta           alfonso.focareta@pronetics.itAngelo Quercioli           angelo.quercioli@pronetics.it     ...
Demo!Alfonso Focareta                 alfonso.focareta@pronetics.itAngelo Quercioli                 angelo.quercioli@prone...
Upcoming SlideShare
Loading in …5
×

Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

3,214 views
3,143 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,214
On SlideShare
0
From Embeds
0
Number of Embeds
2,092
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Codemotion 2013 - Creare il proprio motore di ricerca con Apache Solr

  1. 1. Alfonso FocaretaAngelo QuercioliCreare il proprio motore di ricerca con Apache Solralfonso.focareta@pronetics.it (@afocareta) Pro-netics S.p.A.angelo.quercioli@pronetics.it Pro-netics S.p.A
  2. 2. Solr & LuceneAlfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli Angelo.quercioli@pronetics.it
  3. 3. Lucene: features Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it• High performance, full-text & scalable search library• 100% pure Java• Focus: Indexing + Searching Documents (“Document” is just a list of name+value pairs)• No crawlers or document parsing Flexible Text Analysis (tokenizers + token filters)
  4. 4. Solr: features Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it• A full text search server based on Lucene• XML/HTTP, JSON Interfaces• Faceted Search (category counting)• Flexible data schema to define types and fields• Hit Highlighting• Configurable Advanced Caching• Index Replication• Extensible Open Architecture, Plugins• Web Administration Interface• Written in Java5, deployable as a WAR
  5. 5. Solr: licenseAlfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it OPEN SOURCE!! Apache License
  6. 6. Solr: ArchitectureAlfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it
  7. 7. Solr: Installing and Starting Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it• JDK5 or above intsalledhttp://localhost:8983/solr/admin/ in your web browser for admin it
  8. 8. Solr: Define a schema.xml Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Define a Schema (schema.xml)The file schema.xml describes the structures of the data indexed.• Type definitions• Field definitions• CopyField section• Additional definitions
  9. 9. Solr: Define a schema.xml (type definition) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Type DefinitionList of type and component (simple and complex)• Primitive type• WhiteSpaceTokenizerFactory• StopFilterFactory• WordDelimiterFilterFactory• LowerCaseFilterFactory• SnowBallFilterFactory (stemming)
  10. 10. Solr: Define a schema.xml (type definition- example)Alfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it Type Definition - Example
  11. 11. Solr: Define a schema.xml (type definition- example) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Field Definitions• Field Attributes: name, type, indexed, stored, multiValued, omitNorms, termVectors<field name="id“ type="string" indexed="true" stored="true"/><field name="sku“ type="textTight” indexed="true" stored="true"/><field name="name“ type="text“ indexed="true" stored="true"/><field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/><field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/><field name=“category“ type=”text_ws“ indexed=”true” stored=“true” multiValued="true"/>• Dynamic Fields<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/><dynamicField name="*_s" type="string“ indexed="true" stored="true"/><dynamicField name="*_t" type="text“ indexed="true" stored="true"/>
  12. 12. Solr: Define a schema.xml (Copy Field- example) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Copy FieldCopies one field to another at index time.Case#1: Analyze same field different ways – copy into a field with a different analyzer – boost exact-case, exact-punctuation matches – language translations, thesaurus, soundex<field name=“title” type=“text”/><field name=“title_exact” type=“text_exact” stored=“false”/><copyField source=“title” dest=“title_exact”/>Case #2: Index multiple fields into single searchable field
  13. 13. Solr: Indexing Method Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli Angelo.quercioli@pronetics.it Indexing MethodYou put documents in it (called "indexing") via :• XML• JSON• CSV• Binary over http (multipart request)
  14. 14. Solr: Indexing (Java Api) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli Angelo.quercioli@pronetics.it Indexing by SolrjSend an xml like this <add><doc <field name=“id”>043564</field> <field name=“name”>Alfonso</field> <field name=“surname”>Focareta</field> <field name=“category”>developer</field> <field name=“language”>Italian</field> <field name=“language”>English</field> </doc></add>
  15. 15. Solr: Indexing (Solrj) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it SolrjSolrj is a java client to access solr, It offers a java interface to add, update, and query the solr index Example ->
  16. 16. Solr: Indexing (Solrj) ExampleAlfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it
  17. 17. Solr: Delete Document Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Delete document(s)• Delete by Id(most efficient) <delete> <id>05591</id> <id>32552</id> </delete>• Delete by Query <delete> <query>language:english</query> </delete>
  18. 18. Solr: Commit and Optimize Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Commit and OptimizeCommit : when you are indexing documents to Solr none of the changes you are making will appear until you run the commit command!Optimize: the command that reorganize the index intosegments (increasing search speed) and remove any deleted(replaced) documents.
  19. 19. Solr: Searching Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it SearchingYou can search document in Solr by http or by solrj library.http://localhost:8983/solr/select?q=language:italian&start=0&rows =2&fl=name,surname<response> <result numFound=“15" start="0"> <doc> <str name=“name">Angelo</str> <str name=“surname”>quercioli</str> </doc> <doc> <str name=“name">Alfonso</str> <str name=“surname”>Focareta</str> </doc> </result></response>
  20. 20. Solr: Searching (Response Format) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Response FormatYou can add &wt=json for JSON formatted response{“result": {"numFound":15, "start":0, "docs": [ {“name”:”Angelo”, “surname”:”Quercioli”}, {“name”:” Alfonso”, “surname”:” Focareta”} ]}
  21. 21. Solr: Searching – Query Syntax Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Lucene Query Syntax• Italian english Equiv: italian OR english QueryParser default operator is “OR”/optional• Wildcard searches: ang?o, alf*o, rom*• +italian+english –name:angelo Equiv: italian AND english NOT name:angelo• “justice league” –name:aquaman• releaseDate:[2012-01-01T00-00-00Z TO 2013-12-31T23:59:59Z]• description:“legge roma”~100 •
  22. 22. Solr: Searching – Query Syntax 2 Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Lucene Query Syntax 2• *:*• (angelo AND “pier francesco”) OR (+federico +paolo)
  23. 23. Solr: Function Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Function Query• Allows adding function of field value to score – Boost recently added or popular documents• Current parser only supports function notation• Example: log(sum(popularity,1))• sum, min, max, log, sqrt, currency, ms … etc• scale(x, target_min, target_max) – calculates min & max of x across all docs• map(x, min, max, target) – useful for dealing with defaults
  24. 24. Solr: Boosted Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Boosted Query• Score is multiplied instead of added – New local params {!...} syntax added&q={!boost b=sqrt(popularity)}”super man”• Parameter dereferencing in local params&q={!boost b=$boost v=$userq}&boost=sqrt(popularity)&userq=“super man”
  25. 25. Solr: Facet Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetic.it Facet QueryFaceted search breaks up search result into multiple categorieshttp://solr/select?q=foo&wt=json&indent=on &facet=true&facet.field=cat &facet.query=price:[0 TO 100] &facet.query=manu:IBM{"response":{"numFound":26,"start":0,"docs":[…]}, “facet_counts":{ "facet_queries":{ "price:[0 TO 100]":6, “manu:IBM":2}, "facet_fields":{ "cat":[ "electronics",14, "memory",3, "card",2, "connector",2] }}}
  26. 26. Solr: Filter Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Filter Query• Filters are restrictions in addition to the query• Use in faceting to narrow the results• Filters are cached separately for speedUser queries for memory, query sent to solr is &q=memory&fq=inStock:true&facet=true&…2. User selects 1GB memory size &q=memory&fq=inStock:true&fq=size:1GB&…3. User selects DDR2 memory type &q=memory&fq=inStock:true&fq=size:1GB &fq=type:DDR2&…
  27. 27. Demo!Alfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it Demo!
  28. 28. Demo!Alfonso Focareta alfonso.focareta@pronetics.itAngelo Quercioli angelo.quercioli@pronetics.it Questions ?

×