Anwendungsfaelle für Elasticsearch

4,164
-1

Published on

German slides for different use cases for Elasticsearch: Document Store, full text search, flexible query cache, geospatial search, logfile analytics, analytics.

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,164
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Anwendungsfaelle für Elasticsearch

  1. 1. Anwendungsfälle für Florian Hopf @fhopf http://www.florian-hopf.de 15.07.2014
  2. 2. Agenda
  3. 3. Vorbereitung
  4. 4. curl -XGET http://localhost:9200 { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } Installation curl -XGET http://localhost:9200 { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } # download archive wget https://download.elasticsearch.org/ elasticsearch/elasticsearch/elasticsearch-1.2.1.zip # zip is for windows and linux unzip elasticsearch-1.2.1.zip # on windows: elasticsearch.bat elasticsearch-1.2.1/bin/elasticsearch
  5. 5. curl -XGET http://localhost:9200 { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } Zugriff curl -XGET http://localhost:9200 { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" } curl -XGET http://localhost:9200 { "status" : 200,"name" : "Hawkeye", "version" : { "number" : "1.2.1", "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", "build_timestamp" : "2014-06-03T15:02:52Z", "build_snapshot" : false, "lucene_version" : "4.8" }, "tagline" : "You Know, for Search" }
  6. 6. Document Store
  7. 7. Document { "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-15T16:30:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Developer Week", "city" : "Nürnberg" } }
  8. 8. Speichern curl -XPOST http://localhost:9200/conferences/talk/ --data-binary @talk-example.json { "_index":"conferences", "_type":"talk", "_id":"GqjY7l8sTxa3jLaFx67_aw", "_version":1, "created":true }
  9. 9. Speichern curl -XPOST http://localhost:9200/conferences/talk/ --data-binary @talk-example.json { "_index":"conferences", "_type":"talk", "_id":"GqjY7l8sTxa3jLaFx67_aw", "_version":1, "created":true } Index
  10. 10. Speichern curl -XPOST http://localhost:9200/conferences/talk/ --data-binary @talk-example.json { "_index":"conferences", "_type":"talk", "_id":"GqjY7l8sTxa3jLaFx67_aw", "_version":1, "created":true } Index Type
  11. 11. Lesen curl -XGET http://localhost:9200/conferences/talk/ GqjY7l8sTxa3jLaFx67_aw?pretty=true { "_index" : "conferences", [...] "_source":{ "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-15T16:30:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Developer Week", "city" : "Nürnberg" } } }
  12. 12. Sharding ● Aufteilen eines Index in mehrere Teile – Default: 5 Shards pro Elasticsearch-Index ● Mehrere Elasticsearch-Instanzen können einen Cluster bilden – Automatische Verteilung auf die Knoten im Cluster
  13. 13. Sharding
  14. 14. Sharding
  15. 15. Sharding
  16. 16. ● Einfache Speicherung von JSON-Dokumenten ● Index und Type ● Sharding für große Datenmengen ● Verteilung ist First Class Citizen Recap
  17. 17. Users ● HipChat – http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and- indexes-billions-of-messages-using-el.html ● Engagor – http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to- elasticsearch/ – http://www.elasticsearch.org/case-study/engagor/
  18. 18. Volltextsuche
  19. 19. Suche per Parameter curl -XGET "http://localhost:9200/conferences/talk/_search ?q=elasticsearch&pretty=true" {"took" : 73, […] "hits" : { […] "hits" : [ { […] "_score" : 0.076713204, "_source":{ "title" : "Anwendungsfälle für Elasticsearch", "tags" : ["Java", "Lucene"], […] } } ] } }
  20. 20. Query DSL curl -XPOST "http://localhost:9200/conferences/_search " -d' { "query": { "match": { "title" : { "query": "elasticsaerch", "fuzziness": 2 } } }, "filter": { "term": { "conference.city": "nürnberg" } } }'
  21. 21. Sprache curl -XGET "http://localhost:9200/conferences/talk/_search ?q=title:anwendungsfall&pretty=true" { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } }
  22. 22. Term Document Id anwendungsfall 1 elasticsearch 1,2 fur 1 mit 1 such 1 verteilt 1 1. Tokenization 2. Lowercasing 3. Stemming Anwendungsfälle für Elasticsearch Verteiltes Suchen mit Elasticsearch Analyzing
  23. 23. Mapping curl -XDELETE "http://localhost:9200/conferences/" curl -XPUT "http://localhost:9200/conferences/“ curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d' { "properties": { "tags": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "analyzer": "german" } } }'
  24. 24. Sprache curl -XGET "http://localhost:9200/conferences/talk/_search ?q=title:anwendungsfall&pretty=true" { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, […] } }
  25. 25. Was noch? ● Faceting/Aggregations ● Suggestions ● Highlighting ● Sortierung ● Pagination ● ...
  26. 26. Recap ● Ausdrucksstarke Suchen über Query DSL ● Analyzing als Kernfunktionaltät ● Alle Lucene-Goodies verfügbar
  27. 27. Users ● GitHub – http://exploringelasticsearch.com/github_interview.html – http://www.elasticsearch.org/case-study/github/ ● StackOverflow – http://meta.stackexchange.com/questions/160100/a-new-search-engine-for-stack-exchange – http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/ ● SoundCloud – http://developers.soundcloud.com/blog/architecture-behind-our-new-search-and-explore-experience – http://www.elasticsearch.org/case-study/soundcloud/ ● XING – http://www.elasticsearch.org/case-study/xing/
  28. 28. Flexibler Cache
  29. 29. Anwendung DB Setup Suche
  30. 30. Nur Suche?
  31. 31. Anwendung DB Queries
  32. 32. Listing curl -XPOST "http://localhost:9200/conferences/_search " -d' { "filter": { "term": { "conference.city": "nürnberg" } } }'
  33. 33. Geo-Suche
  34. 34. Strukturierte Suche ● Nicht nur Volltext – Strukturierte Daten: Geo- und numerische Daten, Datumswerte ● Geopoint als Datentyp ● Sortierung ● Filterung
  35. 35. Anwendungen ● Zeige nächste Filiale ● Filialsuche ● Sortierung Kleinanzeigen ● Sortierung Locations ● Filterung auf Nähe ● Social Media-Analysen
  36. 36. Document { "title" : "Anwendungsfälle für Elasticsearch", "speaker" : "Florian Hopf", "date" : "2014-07-15T16:30:00.000Z", "tags" : ["Java", "Lucene"], "conference" : { "name" : "Developer Week", "city" : "Nürnberg", "coordinates": { "lon": "11.115358", "lat": "49.417175" } } }
  37. 37. Mapping curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d' { "properties": { […], "conference": { "type": "object", "properties": { "coordinates": { "type": "geo_point" } } } } }'
  38. 38. Sortierung curl -XPOST "http://localhost:9200/conferences/_search " -d' { "sort" : [ { "_geo_distance" : { "conference.coordinates" : { "lon": 8.403697, "lat": 49.006616 }, "order" : "asc", "unit" : "km" } } ] }'
  39. 39. Filterung curl -XPOST "http://localhost:9200/conferences/_search" -d' { "filter": { "geo_distance": { "conference.coordinates": { "lon": 8.403697, "lat": 49.006616 }, "distance": "200km", "distance_type": "arc" } } }'
  40. 40. Recap ● Elasticsearch kann mehr als Volltext ● Ausgefeilte Geo-Algorithmen ● Sortierung nach Distanz ● Filterung nach Distanz oder Bereich ● Berechnung von Distanz
  41. 41. Users ● FourSquare – http://engineering.foursquare.com/2012/08/09/foursquare-now-uses- elastic-search-and-on-a-related-note-slashem-also-works-with- elastic-search/ ● Gild – http://www.elasticsearch.org/case-study/gild/
  42. 42. Logfile-Analyse
  43. 43. Logfile-Analyse ● Zentralisierung Logs aus Anwendungen ● Zentralisierung Logs über Maschinen – Auch ohne Zugriff ● Leichte Durchsuchbarkeit ● Real-Time-Analysis / Visualisierung ● Daten für alle!
  44. 44. Logfile-Analyse ● Einlesen – Logstash ● Speicherung – Elasticsearch ● Auswertung – Kibana
  45. 45. Logfile-Analyse
  46. 46. Logstash-Config input { file { path => "/var/log/apache2/access.log" } } filter { grok { match => { message => "%{COMBINEDAPACHELOG}" } } } output { elasticsearch_http { host => "localhost" } }
  47. 47. Kibana
  48. 48. Recap ● Einlesen, Anreichern, Speichern von Logevents ● Zahlreiche Inputs in Logstash ● Konsolidierung ● Zentralisierung ● Auswertung
  49. 49. Users ● Mailgun – http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash- to-serve-billions-of-searchable-events-for-customers/ ● CERN – https://medium.com/@ghoranyi/needle-in-a-haystack-873c97a99983 ● Bloomberg – http://www.elasticsearch.org/videos/using-elasticsearch-logstash- kibana-techologies-centralized-viewing-logs-bloomberg/
  50. 50. Analytics
  51. 51. Analytics ● Aggregationen auf Feldern ● Auswertung auch großer Datenmengen – Social Media – Data Warehouse ● Datenkonsolidierung aus unterschiedlichen Quellen ● Visualisierung
  52. 52. Aggregations curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d' { "aggs" : { "hashtags" : { "terms" : { "field" : "hashtag.text" } } } }' Aggregations
  53. 53. Aggregations "aggregations": { "hashtags": { "buckets": [ { "key": "dartlang", "doc_count": 229 }, { "key": "java", "doc_count": 216 }, [...] Aggregations
  54. 54. Aggregations curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d' { "aggs" : { "hashtags" : { "terms" : { "field" : "hashtag.text" }, "aggs" : { "hashtagusers" : { "terms" : { "field" : "user.screen_name" } } } } } }' Aggregations
  55. 55. Aggregations "key": "scala", "doc_count": 130, "hashtagusers": { "buckets": [ { "key": "jaceklaskowski", "doc_count": 74 }, { "key": "ManningBooks", "doc_count": 3 }, [...] Aggregations
  56. 56. ● Bucket Aggregations – terms – (date_)histogram – range – significant_terms – ... ● Metrics Aggregations – min, max, sum, avg – stats – percentiles – value_count – ... Aggregations
  57. 57. Tweets
  58. 58. Recap ● Auswertung großer Datenmengen ● Visualisierung ● Zahlreiche Aggregationen – Berechnungen, max, min, mean – Terms, SignificantTerms
  59. 59. Users ● Engagor ● The Guardian – http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash- to-serve-billions-of-searchable-events-for-customers/ – http://www.infoq.com/presentations/elasticsearch-guardian ● Cogenta – http://www.elasticsearch.org/case-study/cogenta/
  60. 60. Agenda
  61. 61. @fhopf mail@florian-hopf.de http://blog.florian-hopf.de Vielen Dank!
  62. 62. ● http://www.morguefile.com/archive/display/685952 ● http://www.morguefile.com/archive/display/2359 ● http://www.morguefile.com/archive/display/615356 ● http://www.morguefile.com/archive/display/914733 ● http://www.morguefile.com/archive/display/826258 ● http://www.morguefile.com/archive/display/170605 ● http://www.morguefile.com/archive/display/181488 Images
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×