Search Evolution - Von Lucene zu Solr und ElasticSearch

2,108 views
1,937 views

Published on

Bedcon 2013 Vortrag über Grundlagen einer Suche, Lucene, Solr und ElasticSearch

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,108
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Search Evolution - Von Lucene zu Solr und ElasticSearch

  1. 1. Search Evolution – von Lucene zu Solr und ElasticSearchFlorian Hopf@fhopfhttp://www.florian-hopf.de
  2. 2. Index Indizieren Index Suchen
  3. 3. IndexTerm Document Id
  4. 4. Analyzinghttp://www.flickr.com/photos/quinnanya/5196951914/
  5. 5. Analyzing Such Evolution -Von Lucene zu Solr undElasticSearch Verteiltes Suchen mitElasticsearch
  6. 6. Analyzing Such Term Document Id Evolution - Such 1Von Lucene 1. Tokenization zu Solr und Evolution 1ElasticSearch Von 1 Lucene 1 zu 1 Solr 1 und 1 ElasticSearch 1 Verteiltes Verteiltes 2Suchen mit Suchen 2Elasticsearch mit 2 Elasticsearch 2
  7. 7. Analyzing Such Evolution - Term Document IdVon Lucene 1. Tokenization such 1 zu Solr und evolution 1ElasticSearch von 1 2. Lowercasing lucene 1 zu 1 solr 1 und 1 elasticsearch 1,2 VerteiltesSuchen mit verteiltes 2Elasticsearch suchen 2 mit 2
  8. 8. Analyzing Such Evolution - Term Document IdVon Lucene 1. Tokenization zu Solr und such 1,2ElasticSearch evolution 1 von 1 2. Lowercasing luc 1 zu 1 solr 1 3. Stemming und 1 Verteiltes elasticsearch 1,2Suchen mit verteilt 2Elasticsearch mit 2
  9. 9. Inverted Index
  10. 10. Analyzer
  11. 11. Query Syntaxdatenbank OR DBtitle:elasticsearch"apache lucene"speaker:hopp~elastic* AND date:[20130101 TO 20130501]
  12. 12. Relevance
  13. 13. http://www.ibm.com/developerworks/java/library/os-apache-lucenesearch/
  14. 14. DocumentsDocument Field title title Integration1 Suchen mitmit Apache Camel Verteiltes Name ganz einfach Elasticsearch Value Value 1 Field date title Name 1 20130404 Value Value 1 Field title speaker title Integration1 Halil-Cem mit Apache Camel Name Dr. einfach Gürsoy 1 ganz Value Value
  15. 15. Attributes NOT_ANALYZEDANALYZED NO Index ...
  16. 16. AttributesYES NO Store
  17. 17. IndexingDocument es = new Document();es.add(new Field("title", "Verteiltes Suchen mit Elasticsearch", Field.Store.YES, Field.Index.ANALYZED));es.add(new Field("date", "20130404", Field.Store.NO, Field.Index.ANALYZED));es.add(new Field("speaker", "Dr. Halil-Cem Gürsoy", Field.Store.YES, Field.Index.ANALYZED));
  18. 18. IndexingDirectory dir = FSDirectory.open( new File("/tmp/testindex"));IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_36, new GermanAnalyzer(Version.LUCENE_36));IndexWriter writer = new IndexWriter(dir, config);writer.addDocument(es);writer.commit();
  19. 19. SearchingIndexReader reader = IndexReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);QueryParser parser = new QueryParser( Version.LUCENE_36, "title", new GermanAnalyzer(Version.LUCENE_36));Query query = parser.parse("suche");TopDocs result = searcher.search(query, 10);assertEquals(1, result.totalHits);int id = result.scoreDocs[0].doc;Document doc = searcher.doc(id);String title = doc.get("title");assertEquals( "Verteiltes Suchen mit Elasticsearch", title);
  20. 20. Webapp Webapp XML, JSON, JavaBin, Ruby, ...Client http Solr Lucene
  21. 21. Config Solr Home conf data solr-schema.xml Lucene config.xml
  22. 22. Schemaschema.xml Field Types Fields
  23. 23. Schema<fieldType name="text_de" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.GermanLightStemFilterFactory"/> </analyzer></fieldType>
  24. 24. Schema<fields> <field name="title" type="text_de" indexed="true" stored="true"/> <field name="speaker" type="string" indexed="true" stored="true" multiValued="true"/> <field name="speaker_search" type="text_ws" indexed="true" stored="false" multiValued="true"/> [...]</fields><copyField source="speaker" dest="speaker_search"/>
  25. 25. IndexingSolrInputDocument document = new SolrInputDocument();document.addField("path", "/tmp/foo");document.addField("title", "Verteiltes Suchen mit Elasticsearch");document.addField("speaker", "Dr. Halil-Cem Gürsoy");SolrServer server = new HttpSolrServer("http://localhost:8080");server.add(document);server.commit();
  26. 26. Solrconfigsolrconfig.xml Lucene Config Caches Request Handler Search Components
  27. 27. Solrconfig<requestHandler name="/bedcon" class="solr.SearchHandler"> <lst name="defaults"> <int name="rows">10</int> <str name="q.op">AND</str> <str name="q.alt">*:*</str> <str name="defType">edismax</str> <str name="qf"> content title^1.5 speaker_search </str> </lst></requestHandler>
  28. 28. SearchingSolrQuery solrQuery = new SolrQuery("suche");solrQuery.setQueryType("/bedcon");QueryResponse response = server.query(solrQuery);assertEquals(1, response.getResults().size());SolrDocument result = response.getResults().get(0);assertEquals("Verteiltes Suchen mit Elasticsearch", result.get("title"));assertEquals("Dr. Halil-Cem Gürsoy", result.getFirstValue("speaker"));
  29. 29. Faceting...solrQuery.setFacet(true);solrQuery.addFacetField("speaker");QueryResponse response = server.query(solrQuery);List<FacetField.Count> speakerFacet = response.getFacetField("speaker").getValues();assertEquals(1, speakerFacet.get(0).getCount());assertEquals("Dr. Halil-Cem Gürsoy", speakerFacet.get(0).getName());
  30. 30. Indexingcurl -XPOST http://localhost:9200/bedcon/talk/ -d { "speaker" : "Dr. Halil-Cem Gürsoy", "date" : "2013-04-04T16:00:00", "title" : "Verteiltes Suchen mit Elasticsearch"}{"ok":true,"_index":"bedcon","_type":"talk","_id":"CeltdivQRGSvLY_dBZv1jw","_version":1}
  31. 31. Mappingcurl -XPUT http://host/bedcon/talk/_mapping -d { "talk" : { "properties" : { "title" : { "type" : "string", "analyzer" : "german" } } }}
  32. 32. Searchingcurl -XGET http://host/bedcon/talk/_search?q=elasticsearch{...},"hits":{"total":1,"max_score":0.054244425, "hits":[{ ..., "_score":0.054244425, "_source" : { "speaker" : "Dr. Halil-Cem Gürsoy", "date" : "2013-04-04T16:00:00", "title": "Verteiltes Suchen mit Elasticsearch" } }}
  33. 33. Searchingcurl -XGEThttp://localhost:9200/bedcon/talk/_search -d { "query" : { "query_string" : {"query" : "elasticsearch"} }, "facets" : { "tags" : { "terms" : {"field" : "speaker"} } }}
  34. 34. SearchingSearchResponse response = esClient.prepareSearch("bedcon") .addFacet( FacetBuilders.termsFacet("speaker") .field("speaker")) .setQuery( QueryBuilders.queryString("elasticsearch")) .execute().actionGet();assertEquals(1, response.getHits().getTotalHits());
  35. 35. Verteilung
  36. 36. Verteilung
  37. 37. http://lucene.apache.orghttp://lucene.apache.org/solr/http://elasticsearch.orghttps://github.com/fhopf/lucene-solr-talk@fhopfmail@florian-hopf.dehttp://blog.florian-hopf.de
  38. 38. Images● http://www.morguefile.com/archive/display/3470● http://www.flickr.com/photos/quinnanya/5196951914/ Quinn Dombrowski● http://www.morguefile.com/archive/display/695239● http://www.morguefile.com/archive/display/93433● http://www.morguefile.com/archive/display/811746● http://www.morguefile.com/archive/display/12965● http://www.morguefile.com/archive/display/181488

×