Anwendungsfaelle für Elasticsearch

Anwendungsfälle für
Florian Hopf
@fhopf
http://www.florian-hopf.de 15.07.2014

curl -XGET http://localhost:9200
{
"status" : 200,"name" : "Hawkeye",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
Installation
{
"version" : {
"number" : "1.2.1",
},
}
# download archive
wget https://download.elasticsearch.org/
elasticsearch/elasticsearch/elasticsearch-1.2.1.zip
# zip is for windows and linux
unzip elasticsearch-1.2.1.zip
# on windows: elasticsearch.bat
elasticsearch-1.2.1/bin/elasticsearch

{
"version" : {
"number" : "1.2.1",
},
}
Zugriff
{
"version" : {
"number" : "1.2.1",
},
}
{
"version" : {
"number" : "1.2.1",
},
}

Document
{
"title" : "Anwendungsfälle für Elasticsearch",
"speaker" : "Florian Hopf",
"date" : "2014-07-15T16:30:00.000Z",
"tags" : ["Java", "Lucene"],
"conference" : {
"name" : "Developer Week",
"city" : "Nürnberg"
}
}

Speichern
curl -XPOST http://localhost:9200/conferences/talk/
--data-binary @talk-example.json
{
"_index":"conferences",
"_type":"talk",
"_id":"GqjY7l8sTxa3jLaFx67_aw",
"_version":1,
"created":true
}

Speichern
{
"_type":"talk",
"_version":1,
"created":true
}
Index

Speichern
{
"_type":"talk",
"_version":1,
"created":true
}
Index Type

Lesen
curl -XGET http://localhost:9200/conferences/talk/
GqjY7l8sTxa3jLaFx67_aw?pretty=true
{
"_index" : "conferences",
[...]
"_source":{
"date" : "2014-07-15T16:30:00.000Z",
"conference" : {
"city" : "Nürnberg"
}
}
}

Sharding
● Aufteilen eines Index in mehrere Teile
– Default: 5 Shards pro Elasticsearch-Index
● Mehrere Elasticsearch-Instanzen können einen Cluster bilden
– Automatische Verteilung auf die Knoten im Cluster

● Einfache Speicherung von JSON-Dokumenten
● Index und Type
● Sharding für große Datenmengen
● Verteilung ist First Class Citizen
Recap

Users
● HipChat
– http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-
indexes-billions-of-messages-using-el.html
● Engagor
– http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-
elasticsearch/
– http://www.elasticsearch.org/case-study/engagor/

Suche per Parameter
curl -XGET "http://localhost:9200/conferences/talk/_search
?q=elasticsearch&pretty=true"
{"took" : 73,
[…]
"hits" : {
[…]
"hits" : [ {
[…]
"_score" : 0.076713204,
"_source":{
[…]
} } ]
}
}

Query DSL
curl -XPOST "http://localhost:9200/conferences/_search " -d'
{
"query": {
"match": {
"title" : {
"query": "elasticsaerch",
"fuzziness": 2
}
}
},
"filter": {
"term": {
"conference.city": "nürnberg"
}
}
}'

Sprache
?q=title:anwendungsfall&pretty=true"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Term Document Id
anwendungsfall 1
elasticsearch 1,2
fur 1
mit 1
such 1
verteilt 1
1. Tokenization
2. Lowercasing
3. Stemming
Anwendungsfälle
für Elasticsearch
Verteiltes
Suchen mit
Elasticsearch
Analyzing

Mapping
curl -XDELETE "http://localhost:9200/conferences/"
curl -XPUT "http://localhost:9200/conferences/“
curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d'
{
"properties": {
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string",
"analyzer": "german"
}
}
}'

Sprache
?q=title:anwendungsfall&pretty=true"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
[…]
}
}

Was noch?
● Faceting/Aggregations
● Suggestions
● Highlighting
● Sortierung
● Pagination
● ...

Recap
● Ausdrucksstarke Suchen über Query DSL
● Analyzing als Kernfunktionaltät
● Alle Lucene-Goodies verfügbar

Users
● GitHub
– http://exploringelasticsearch.com/github_interview.html
– http://www.elasticsearch.org/case-study/github/
●
StackOverflow
– http://meta.stackexchange.com/questions/160100/a-new-search-engine-for-stack-exchange
– http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/
●
SoundCloud
– http://developers.soundcloud.com/blog/architecture-behind-our-new-search-and-explore-experience
– http://www.elasticsearch.org/case-study/soundcloud/
● XING
– http://www.elasticsearch.org/case-study/xing/

Listing
{
"filter": {
"term": {
"conference.city": "nürnberg"
}
}
}'

Strukturierte Suche
● Nicht nur Volltext
– Strukturierte Daten: Geo- und numerische Daten, Datumswerte
● Geopoint als Datentyp
● Sortierung
● Filterung

Anwendungen
● Zeige nächste Filiale
● Filialsuche
● Sortierung Kleinanzeigen
● Sortierung Locations
● Filterung auf Nähe
● Social Media-Analysen

Document
{
"date" : "2014-07-15T16:30:00.000Z",
"conference" : {
"city" : "Nürnberg",
"coordinates": {
"lon": "11.115358",
"lat": "49.417175"
}
}
}

Mapping
curl -XPUT "http://localhost:9200/conferences/talk/_mapping" -d'
{
"properties": {
[…],
"conference": {
"type": "object",
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
}
}'

Sortierung
{
"sort" : [
{
"_geo_distance" : {
"conference.coordinates" : {
"lon": 8.403697,
"lat": 49.006616
},
"order" : "asc",
"unit" : "km"
}
}
]
}'

Filterung
curl -XPOST "http://localhost:9200/conferences/_search" -d'
{
"filter": {
"geo_distance": {
"conference.coordinates": {
"lon": 8.403697,
"lat": 49.006616
},
"distance": "200km",
"distance_type": "arc"
}
}
}'

Recap
● Elasticsearch kann mehr als Volltext
● Ausgefeilte Geo-Algorithmen
● Sortierung nach Distanz
● Filterung nach Distanz oder Bereich
● Berechnung von Distanz

Users
● FourSquare
– http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-
elastic-search-and-on-a-related-note-slashem-also-works-with-
elastic-search/
● Gild
– http://www.elasticsearch.org/case-study/gild/

Logfile-Analyse
● Zentralisierung Logs aus Anwendungen
● Zentralisierung Logs über Maschinen
– Auch ohne Zugriff
● Leichte Durchsuchbarkeit
● Real-Time-Analysis / Visualisierung
● Daten für alle!

Logfile-Analyse
● Einlesen
– Logstash
● Speicherung
– Elasticsearch
● Auswertung
– Kibana

Logstash-Config
input {
file {
path => "/var/log/apache2/access.log"
}
}
filter {
grok {
match => { message => "%{COMBINEDAPACHELOG}" }
}
}
output {
elasticsearch_http {
host => "localhost"
}
}

Recap
● Einlesen, Anreichern, Speichern von Logevents
● Zahlreiche Inputs in Logstash
● Konsolidierung
● Zentralisierung
● Auswertung

Users
● Mailgun
– http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-
to-serve-billions-of-searchable-events-for-customers/
● CERN
– https://medium.com/@ghoranyi/needle-in-a-haystack-873c97a99983
● Bloomberg
– http://www.elasticsearch.org/videos/using-elasticsearch-logstash-
kibana-techologies-centralized-viewing-logs-bloomberg/

Analytics
● Aggregationen auf Feldern
● Auswertung auch großer Datenmengen
– Social Media
– Data Warehouse
● Datenkonsolidierung aus unterschiedlichen Quellen
● Visualisierung

Aggregations
curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d'
{
"aggs" : {
"hashtags" : {
"terms" : {
"field" : "hashtag.text"
}
}
}
}'
Aggregations

Aggregations
"aggregations": {
"hashtags": {
"buckets": [
{
"key": "dartlang",
"doc_count": 229
},
{
"key": "java",
"doc_count": 216
},
[...]
Aggregations

Aggregations
curl -XGET "http://localhost:9200/devoxx/tweet/_search" -d'
{
"aggs" : {
"hashtags" : {
"terms" : {
"field" : "hashtag.text"
},
"aggs" : {
"hashtagusers" : {
"terms" : {
"field" : "user.screen_name"
}
}
}
}
}
}'
Aggregations

Aggregations
"key": "scala",
"doc_count": 130,
"hashtagusers": {
"buckets": [
{
"key": "jaceklaskowski",
"doc_count": 74
},
{
"key": "ManningBooks",
"doc_count": 3
},
[...]
Aggregations

● Bucket Aggregations
– terms
– (date_)histogram
– range
– significant_terms
– ...
● Metrics Aggregations
– min, max, sum, avg
– stats
– percentiles
– value_count
– ...
Aggregations

Recap
● Auswertung großer Datenmengen
● Visualisierung
● Zahlreiche Aggregationen
– Berechnungen, max, min, mean
– Terms, SignificantTerms

Users
● Engagor
● The Guardian
– http://www.elasticsearch.org/blog/using-elasticsearch-and-logstash-
to-serve-billions-of-searchable-events-for-customers/
– http://www.infoq.com/presentations/elasticsearch-guardian
● Cogenta
– http://www.elasticsearch.org/case-study/cogenta/

@fhopf
mail@florian-hopf.de
http://blog.florian-hopf.de
Vielen
Dank!

● http://www.morguefile.com/archive/display/685952
Images

Anwendungsfaelle für Elasticsearch

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Anwendungsfaelle für Elasticsearch

Similar to Anwendungsfaelle für Elasticsearch (20)

More from Florian Hopf

More from Florian Hopf (10)

Recently uploaded

Recently uploaded (20)

Anwendungsfaelle für Elasticsearch