Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Einführung in Elasticsearch

578 views

Published on

Vortrag bei der JUG Saar vom 27.10.2016
http://www.meetup.com/de-DE/Java-User-Group-Saarland-jugsaar/events/229836175/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Einführung in Elasticsearch

  1. 1. elasticsearch. Florian Hopf www.florian-hopf.de @fhopf
  2. 2. Agenda ● Suche ● Verteilung ● Elasticsearch und Java ● Aggregationen ● Zentralisiertes Logging
  3. 3. Suche
  4. 4. Suche
  5. 5. Installation # download archive wget https://artifacts.elastic.co/downloads/ elasticsearch/elasticsearch-5.0.0.zip unzip elasticsearch-5.0.0.zip # on windows: elasticsearch.bat elasticsearch-5.0.0/bin/elasticsearch
  6. 6. Zugriff per HTTP curl -XGET "http://localhost:9200" { "name" : "LI8ZN-t", "cluster_name" : "elasticsearch", "cluster_uuid" : "UvbMAoJ8TieUqugCGw7Xrw", "version" : { "number" : "5.0.0", "build_hash" : "253032b", "build_date" : "2016-10-26T04:37:51.531Z", "build_snapshot" : false, "lucene_version" : "6.2.0" }, "tagline" : "You Know, for Search" }
  7. 7. Indizierung POST /library/book { "title": "Elasticsearch in Action", "author": [ "Radu Gheorghe", "Matthew Lee Hinman", "Roy Russo" ], "published": "2015-06-30T00:00:00.000Z", "publisher": { "name": "Manning", "country": "USA" } }
  8. 8. Suche GET /library/book/_search?q=elasticsearch { [...] "hits": { "hits": [ { "_index": "library", "_type": "book", "_source": { "title": "Elasticsearch in Action", [...]
  9. 9. Suche per Query DSL POST /library/book/_search { "query": { "bool": { "must": { "match": { "title": "elasticsearch" } }, "filter": { "term": { "publisher.name": "manning" } } } } }
  10. 10. Sprachspezifische Inhalte POST /library/book/ { "title": "Elasticsearch - Ein praktischer Einstieg", "author": "Florian Hopf", "published": "2015-10-26T00:00:00.000Z", "publisher": { "name": "dpunkt.verlag", "country": "DE" }, "tags": ["Lucene", "Elasticsearch"] }
  11. 11. Sprachspezifische Inhalte POST /library/book/_search { "query": { "match": { "title": "praktisch" } } }
  12. 12. Analyzing Term Document Id Action 1 ein 2 Einstieg 2 Elasticsearch 1,2 in 1 praktischer 2 1. Tokenization Elasticsearch in Action Elasticsearch: Ein praktischer Einstieg
  13. 13. Analyzing Term Document Id action 1 ein 2 einstieg 2 elasticsearch 1,2 in 1 praktischer 2 1. Tokenization Elasticsearch in Action Elasticsearch: Ein praktischer Einstieg 2. Lowercasing
  14. 14. Suche Term Document Id action 1 ein 2 einstieg 2 elasticsearch 1,2 in 1 praktischer 2 1. Tokenization 2. LowercasingElasticsearch elasticsearch
  15. 15. Suche Term Document Id action 1 ein 2 einstieg 2 elasticsearch 1,2 in 1 praktischer 2 1. Tokenization 2. Lowercasingpraktisch praktisch
  16. 16. Analyzing Term Document Id action 1 ein 2 einstieg 2 elasticsearch 1,2 in 1 praktisch 2 1. Tokenization Elasticsearch in Action Elasticsearch: Ein praktischer Einstieg 2. Lowercasing 3. Stemming
  17. 17. Suche Term Document Id action 1 ein 2 einstieg 2 elasticsearch 1,2 in 1 praktisch 2 1. Tokenization 2. Lowercasingpraktisch praktisch 3. Stemming
  18. 18. Mapping PUT /library/book/_mapping { "book": { "properties": { "title": { "type": "text", "analyzer": "german" } } } }
  19. 19. Suchfeatures ● Fertige Analyzer, Konfiguration von eigenen ● Relevanzberechnung ● Paginierung, Sortierung ● Highlighting, Autovervollständigung, ... ● Facettierung über Aggregationen
  20. 20. Recap ● Java-basierter Suchserver ● Kommunikation über HTTP und JSON ● Dokumentenbasierte Speicherung ● Unterstützung unterschiedlicher Datentypen ● Suche über Query DSL auf invertiertem Index
  21. 21. Verteilung
  22. 22. Verteilung
  23. 23. Verteilung
  24. 24. Verteilung
  25. 25. Suche im Cluster
  26. 26. Suche im Cluster
  27. 27. Recap ● Knoten können Cluster bilden ● Datenverteilung durch Sharding ● Replicas zur Lastverteilung und Ausfallsicherheit ● Verteilte Suche ● Cluster-Zustand auf jedem Knoten
  28. 28. Java
  29. 29. Java! dependencies { compile group: 'org.elasticsearch', name: 'elasticsearch', version: '2.4.0' }
  30. 30. Java! TransportAddress address = new InetSocketTransportAddress("localhost", 9300); Client client = new TransportClient(). addTransportAddress(address);
  31. 31. Suche per Query DSL POST /library/book/_search { "query": { "bool": { "must": { "match": { "title": "elasticsearch" } }, "filter": { "term": { "publisher.name": "manning" } } } } }
  32. 32. Suche über Java-Client SearchResponse searchResponse = client .prepareSearch("library") .setQuery( boolQuery(). must(matchQuery("title", "elasticsearch")). filter(termQuery("publisher.name", "manning"))) .execute().actionGet(); assertEquals(1, searchResponse.getHits().getTotalHits()); SearchHit hit = searchResponse.getHits().getAt(0); assertEquals("Elasticsearch in Action", hit.getSource().get("title"));
  33. 33. Indizierung über Java-Client XContentBuilder jsonBuilder = jsonBuilder() .startObject() .field("title", "Elasticsearch") .field("author", "Florian Hopf") .startObject("publisher") .field("name", "dpunkt") .endObject() .endObject(); client.prepareIndex("library", "book") .setSource(jsonBuilder) .execute().actionGet();
  34. 34. Indizierung über Java-Client
  35. 35. Transport-Client ● Verbindet sich mit bestehendem Cluster TransportAddress address = new InetSocketTransportAddress("localhost", 9300); Client client = new TransportClient(). addTransportAddress(address);
  36. 36. Transport-Client
  37. 37. Node-Client Node node = nodeBuilder().client(true).node(); Client client = node.client();
  38. 38. Node-Client ● Startet eigenen Knoten ● Anwendung wird Teil des Clusters Node node = nodeBuilder().client(true).node(); Client client = node.client();
  39. 39. Node-Client
  40. 40. Standard-Client ● Volle API-Unterstützung ● Effiziente Kommunikation ● Node-Client ● Cluster-State spart unnötige Hops
  41. 41. Standard-Client Gotchas ● Elasticsearch-/JDK-Version Client == Server ● Node-Client ● Bei Start und Stop wird Cluster-State übertragen ● Speicherverbrauch ● Elasticsearch-Dependency
  42. 42. Elasticsearch 5 dependencies { compile group: 'org.elasticsearch.client', name: 'transport', version: '5.0.0' }
  43. 43. Java! TransportAddress address = new InetSocketTransportAddress( InetAddress.getByName("localhost"), 9300); Client client = new PreBuiltTransportClient(Settings.EMPTY) addTransportAddress(address);
  44. 44. Elasticsearch 5 ● TransportClient hat eigenes Artefakt ● NodeClient nicht mehr empfohlen
  45. 45. Elasticsearch 5 REST-Client dependencies { compile group: 'org.elasticsearch.client', name: 'rest', version: '5.0.0' }
  46. 46. Elasticsearch 5 REST-Client +--- org.apache.httpcomponents:httpclient:4.5.2 +--- org.apache.httpcomponents:httpcore:4.4.5 +--- org.apache.httpcomponents:httpasyncclient:4.1.2 +--- org.apache.httpcomponents:httpcore-nio:4.4.5 +--- commons-codec:commons-codec:1.10 --- commons-logging:commons-logging:1.1.3
  47. 47. Elasticsearch 5 REST-Client RestClient restClient = RestClient.builder( new HttpHost("localhost", 9200, "http"), new HttpHost("localhost", 9201, "http")).build();
  48. 48. Elasticsearch 5 REST-Client try (RestClient restClient = createRestClient()) { HttpEntity entity = new NStringEntity( "{ "query": { "match_all": {}}}", ContentType.APPLICATION_JSON); // alternative: performRequestAsync Response response = restClient.performRequest("POST", "/_search", emptyMap(), entity); String json = toString(response.getEntity()); // ... }
  49. 49. Elasticsearch 5 REST-Client ● Anwendung unabhängiger von Elasticsearch-Version ● Saubere Trennung Elasticsearch-Cluster/Anwendung ● Minimale Dependencies ● Unterstützung für Node-Sniffing, Fehlerhandling, Timeout-Config, Basic-Auth, Header, ...
  50. 50. Alternativen ● Jest: Http-Client ● Spring Data Elasticsearch ● Große Auswahl für JVM-Sprachen
  51. 51. Aggregationen
  52. 52. Aggregations ● Informationen über die Daten ● Ersetzen und ermöglichen Facetten ● Anwendungsfälle für Suchanwendungen und Analytics
  53. 53. Aggregations
  54. 54. Invertierter Index Term Document Id Elasticsearch 1,2,3 Java 3 Lucene 2,3
  55. 55. Terms-Aggregation POST /library/book/_search { "size": 0, "aggs": { "common-tags": { "terms": { "field": "tags.keyword" } } } }
  56. 56. Terms-Aggregation "aggregations": { "common-tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Elasticsearch", "doc_count": 2 }, { "key": "Lucene", "doc_count": 2 }, { "key": "Java", "doc_count": 1 }] [...]
  57. 57. Terms-Aggregation ● Bucket mit Wert und Anzahl ● Facettierung ● Informationen aus Daten
  58. 58. Aggregationen kombinieren POST /devoxx/tweet/_search { "aggs" : { "hashtags" : { "terms" : { "field" : "hashtag.text" } } } }
  59. 59. Aggregationen kombinieren "aggregations": { "hashtags": { "buckets": [ { "key": "dartlang", "doc_count": 229 }, { "key": "java", "doc_count": 216 }, [...]
  60. 60. Aggregationen kombinieren POST /devoxx/tweet/_search { "aggs" : { "hashtags" : { "terms" : { "field" : "hashtag.text" }, "aggs" : { "hashtagusers" : { "terms" : { "field" : "user.screen_name" } } } } } }
  61. 61. Aggregationen kombinieren "key": "scala", "doc_count": 130, "hashtagusers": { "buckets": [ { "key": "jaceklaskowski", "doc_count": 74 }, { "key": "ManningBooks", "doc_count": 3 }, [...]
  62. 62. Bucket-Aggregationen ● Range-Aggregationen ● Histogramme ● Filter ● Geo-Aggregationen ● …
  63. 63. Metric-Aggregationen ● Berechnen einen oder mehrere Werte ● Meist auf numerischen Feldern ● Stats, Percentiles, Min, Max, Sum, Avg, ...
  64. 64. Stats-Aggregationen GET /library/book/_search { "aggs": { "published_stats": { "stats": { "field": "published" } } } }
  65. 65. Stats-Aggregationen "aggregations": { "published_stats": { "count": 5, "min": 1419292800000, "max": 1445990400000, "avg": 1440547200000, "sum": 7202736000000, "min_as_string": "2014-12-23T00:00:00.000Z", "max_as_string": "2015-10-28T00:00:00.000Z", "avg_as_string": "2015-08-26T00:00:00.000Z", "sum_as_string": "2198-03-31T00:00:00.000Z" } }
  66. 66. Significant Terms Aggregation ● Findet Besonderheiten in Daten ● Bewertet Datensatz gegen Hintergrundmenge ● Beispiel: Häufige Hashtag-Kombinationen
  67. 67. Significant Terms Aggregation POST /conf/_search { "aggs": { "hashtag": { "terms": { "field": "hashtag.text" }, "aggs": { "associated_hashtags": { "significant_terms": { "field": "hashtag.text" } } } } } }
  68. 68. Significant Terms Aggregation ● bbuzz (Berlin Buzzwords) ● elasticsearch, devops, bigdata, cassandra ● javaland ● java8, javafx, javaee7, nighthacking ● dwx14 (Developer Week) ● nueww, nodejs, intelandroid, web, microsoft
  69. 69. Significant Terms Aggregation ● Empfehlungen ● Fraud-Detection ● Geographische Häufungen
  70. 70. Recap ● Aggregationen bieten unterschiedliche Einblicke in die Daten ● Facettierung ● Kombination mehrerer Aggregationen ● Grundlage für Visualisierungen
  71. 71. Zentralisiertes Logging
  72. 72. Zentralisiertes Logging ● Zentralisierung Logs aus Anwendungen ● Zentralisierung Logs über Maschinen ● Auch ohne Zugriff ● Leichte Durchsuchbarkeit ● Real-Time-Analysis / Visualisierung ● Zugang zu Daten
  73. 73. Zentralisiertes Logging ● Einlesen ● Beats/Logstash/Ingest Node ● Speicherung ● Elasticsearch ● Auswertung ● Kibana
  74. 74. Logfile-Analyse
  75. 75. Beats ● Filebeat ● Metricbeat ● Packetbeat ● ...
  76. 76. Logstash-Config input { file { path => "/var/log/apache2/access.log" } } filter { grok { match => { message => "%{COMBINEDAPACHELOG}" } } } output { elasticsearch_http { host => "localhost" } }
  77. 77. Kibana
  78. 78. Kibana
  79. 79. Recap ● Einlesen, Anreichern, Speichern von Logevents ● Zahlreiche Inputs in Logstash ● Konsolidierung ● Zentralisierung ● Auswertung
  80. 80. Noch viel mehr! ● Unterschiedliche Suchfeatures ● Viele Aggregationen ● Geodaten ● Percolator
  81. 81. Elasticsearch 5.0 ● Text- und Keyword-Felder ● Painless Scripting Language ● Shrink-API ● Verbesserugen Performance, Betrieb, … ● http://elasticsearch-buch.de/2016/10/27/neues- in-elasticsearch-5.html
  82. 82. Weitere Infos
  83. 83. Weitere Infos ● http://elastic.co ● Elasticsearch – The definitive Guide ● https://www.elastic.co/guide/en/elasticsearch/guide/master /index.html ● Elasticsearch in Action ● https://www.manning.com/books/elasticsearch-in-action ● http://blog.florian-hopf.de

×