Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Search Evolution - Von Lucene zu Solr und ElasticSearch

2,425 views

Published on

Bedcon 2013 Vortrag über Grundlagen einer Suche, Lucene, Solr und ElasticSearch

Published in: Technology
  • Be the first to comment

Search Evolution - Von Lucene zu Solr und ElasticSearch

  1. 1. Search Evolution – von Lucene zu Solr und ElasticSearchFlorian Hopf@fhopfhttp://www.florian-hopf.de
  2. 2. Index Indizieren Index Suchen
  3. 3. IndexTerm Document Id
  4. 4. Analyzinghttp://www.flickr.com/photos/quinnanya/5196951914/
  5. 5. Analyzing Such Evolution -Von Lucene zu Solr undElasticSearch Verteiltes Suchen mitElasticsearch
  6. 6. Analyzing Such Term Document Id Evolution - Such 1Von Lucene 1. Tokenization zu Solr und Evolution 1ElasticSearch Von 1 Lucene 1 zu 1 Solr 1 und 1 ElasticSearch 1 Verteiltes Verteiltes 2Suchen mit Suchen 2Elasticsearch mit 2 Elasticsearch 2
  7. 7. Analyzing Such Evolution - Term Document IdVon Lucene 1. Tokenization such 1 zu Solr und evolution 1ElasticSearch von 1 2. Lowercasing lucene 1 zu 1 solr 1 und 1 elasticsearch 1,2 VerteiltesSuchen mit verteiltes 2Elasticsearch suchen 2 mit 2
  8. 8. Analyzing Such Evolution - Term Document IdVon Lucene 1. Tokenization zu Solr und such 1,2ElasticSearch evolution 1 von 1 2. Lowercasing luc 1 zu 1 solr 1 3. Stemming und 1 Verteiltes elasticsearch 1,2Suchen mit verteilt 2Elasticsearch mit 2
  9. 9. Inverted Index
  10. 10. Analyzer
  11. 11. Query Syntaxdatenbank OR DBtitle:elasticsearch"apache lucene"speaker:hopp~elastic* AND date:[20130101 TO 20130501]
  12. 12. Relevance
  13. 13. http://www.ibm.com/developerworks/java/library/os-apache-lucenesearch/
  14. 14. DocumentsDocument Field title title Integration1 Suchen mitmit Apache Camel Verteiltes Name ganz einfach Elasticsearch Value Value 1 Field date title Name 1 20130404 Value Value 1 Field title speaker title Integration1 Halil-Cem mit Apache Camel Name Dr. einfach Gürsoy 1 ganz Value Value
  15. 15. Attributes NOT_ANALYZEDANALYZED NO Index ...
  16. 16. AttributesYES NO Store
  17. 17. IndexingDocument es = new Document();es.add(new Field("title", "Verteiltes Suchen mit Elasticsearch", Field.Store.YES, Field.Index.ANALYZED));es.add(new Field("date", "20130404", Field.Store.NO, Field.Index.ANALYZED));es.add(new Field("speaker", "Dr. Halil-Cem Gürsoy", Field.Store.YES, Field.Index.ANALYZED));
  18. 18. IndexingDirectory dir = FSDirectory.open( new File("/tmp/testindex"));IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_36, new GermanAnalyzer(Version.LUCENE_36));IndexWriter writer = new IndexWriter(dir, config);writer.addDocument(es);writer.commit();
  19. 19. SearchingIndexReader reader = IndexReader.open(dir);IndexSearcher searcher = new IndexSearcher(reader);QueryParser parser = new QueryParser( Version.LUCENE_36, "title", new GermanAnalyzer(Version.LUCENE_36));Query query = parser.parse("suche");TopDocs result = searcher.search(query, 10);assertEquals(1, result.totalHits);int id = result.scoreDocs[0].doc;Document doc = searcher.doc(id);String title = doc.get("title");assertEquals( "Verteiltes Suchen mit Elasticsearch", title);
  20. 20. Webapp Webapp XML, JSON, JavaBin, Ruby, ...Client http Solr Lucene
  21. 21. Config Solr Home conf data solr-schema.xml Lucene config.xml
  22. 22. Schemaschema.xml Field Types Fields
  23. 23. Schema<fieldType name="text_de" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.GermanLightStemFilterFactory"/> </analyzer></fieldType>
  24. 24. Schema<fields> <field name="title" type="text_de" indexed="true" stored="true"/> <field name="speaker" type="string" indexed="true" stored="true" multiValued="true"/> <field name="speaker_search" type="text_ws" indexed="true" stored="false" multiValued="true"/> [...]</fields><copyField source="speaker" dest="speaker_search"/>
  25. 25. IndexingSolrInputDocument document = new SolrInputDocument();document.addField("path", "/tmp/foo");document.addField("title", "Verteiltes Suchen mit Elasticsearch");document.addField("speaker", "Dr. Halil-Cem Gürsoy");SolrServer server = new HttpSolrServer("http://localhost:8080");server.add(document);server.commit();
  26. 26. Solrconfigsolrconfig.xml Lucene Config Caches Request Handler Search Components
  27. 27. Solrconfig<requestHandler name="/bedcon" class="solr.SearchHandler"> <lst name="defaults"> <int name="rows">10</int> <str name="q.op">AND</str> <str name="q.alt">*:*</str> <str name="defType">edismax</str> <str name="qf"> content title^1.5 speaker_search </str> </lst></requestHandler>
  28. 28. SearchingSolrQuery solrQuery = new SolrQuery("suche");solrQuery.setQueryType("/bedcon");QueryResponse response = server.query(solrQuery);assertEquals(1, response.getResults().size());SolrDocument result = response.getResults().get(0);assertEquals("Verteiltes Suchen mit Elasticsearch", result.get("title"));assertEquals("Dr. Halil-Cem Gürsoy", result.getFirstValue("speaker"));
  29. 29. Faceting...solrQuery.setFacet(true);solrQuery.addFacetField("speaker");QueryResponse response = server.query(solrQuery);List<FacetField.Count> speakerFacet = response.getFacetField("speaker").getValues();assertEquals(1, speakerFacet.get(0).getCount());assertEquals("Dr. Halil-Cem Gürsoy", speakerFacet.get(0).getName());
  30. 30. Indexingcurl -XPOST http://localhost:9200/bedcon/talk/ -d { "speaker" : "Dr. Halil-Cem Gürsoy", "date" : "2013-04-04T16:00:00", "title" : "Verteiltes Suchen mit Elasticsearch"}{"ok":true,"_index":"bedcon","_type":"talk","_id":"CeltdivQRGSvLY_dBZv1jw","_version":1}
  31. 31. Mappingcurl -XPUT http://host/bedcon/talk/_mapping -d { "talk" : { "properties" : { "title" : { "type" : "string", "analyzer" : "german" } } }}
  32. 32. Searchingcurl -XGET http://host/bedcon/talk/_search?q=elasticsearch{...},"hits":{"total":1,"max_score":0.054244425, "hits":[{ ..., "_score":0.054244425, "_source" : { "speaker" : "Dr. Halil-Cem Gürsoy", "date" : "2013-04-04T16:00:00", "title": "Verteiltes Suchen mit Elasticsearch" } }}
  33. 33. Searchingcurl -XGEThttp://localhost:9200/bedcon/talk/_search -d { "query" : { "query_string" : {"query" : "elasticsearch"} }, "facets" : { "tags" : { "terms" : {"field" : "speaker"} } }}
  34. 34. SearchingSearchResponse response = esClient.prepareSearch("bedcon") .addFacet( FacetBuilders.termsFacet("speaker") .field("speaker")) .setQuery( QueryBuilders.queryString("elasticsearch")) .execute().actionGet();assertEquals(1, response.getHits().getTotalHits());
  35. 35. Verteilung
  36. 36. Verteilung
  37. 37. http://lucene.apache.orghttp://lucene.apache.org/solr/http://elasticsearch.orghttps://github.com/fhopf/lucene-solr-talk@fhopfmail@florian-hopf.dehttp://blog.florian-hopf.de
  38. 38. Images● http://www.morguefile.com/archive/display/3470● http://www.flickr.com/photos/quinnanya/5196951914/ Quinn Dombrowski● http://www.morguefile.com/archive/display/695239● http://www.morguefile.com/archive/display/93433● http://www.morguefile.com/archive/display/811746● http://www.morguefile.com/archive/display/12965● http://www.morguefile.com/archive/display/181488

×