Your SlideShare is downloading. ×
0
Elasticsearch
Mye mer enn søk!
Alex Brasetvik
alex@found.no
@alexbrasetvik
Wednesday, September 11, 13
Hvem?
Co-founder av Found AS
7+ år søk, 2+ Elasticsearch
Håndterer hundrevis av Elasticsearch-clustre
Wednesday, September...
Agenda
0. Elasticsearch
1. Bruksområder
2. Lingo
3. Datastrukturer
4. Tekstprosessering
5. Elasticsearch
6. NOSQL?
Wednesd...
Elasticsearch
Open source
Real-time søk og analyse
Skjemafri
Basert på Lucene
Wednesday, September 11, 13
Wednesday, September 11, 13
$ curl localhost:9200/sample_index/sample_type -XPOST -d '
{
"user": {
"name": "DEVOPS_BORAT"
},
"followers": 42000,
"loca...
$ curl localhost:9200/sample_index/sample_type/_search -XPOST -d '
{
"query":{
"match": {
"message": "consistent"
}
}
}
'
...
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"m...
{
"sample_index" : {
"sample_message" : {
"properties" : {
"followers" : {
"type" : "long"
},
"location" : {
"properties" ...
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
{"id"=>12296272736,
"text"=>
"An early look at Annotations:
http://groups.google.com/group/twitter-api-announce/browse_thr...
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
user:
name: DEVOPS_BORAT
message: “1+1=2 only in legacy system. In modern
distributed database with eventual consistent is...
The quick brown fox had a day off
whitespace-tokenizer
Wednesday, September 11, 13
Filter: boolean match
Query: match med score
Kan være satt sammen av andre queries
Filter / Query
Wednesday, September 11,...
“Søk”
Hele informasjonsbehovet
Query, filtre, fasetter, paginering, ...
Wednesday, September 11, 13
Invertert indeks
"If you don't find it in the index, look very carefully
through the entire catalog."
–Sears, Roebuck, and ...
Wednesday, September 11, 13
AbstractEnterpriseSingletonProxyFactoryBean
Wednesday, September 11, 13
xkcd.com/292
Wednesday, September 11, 13
AbstractSingletonProxyFactoryBean
camelCase-tokenizer
lowercase
Wednesday, September 11, 13
Prefiks-problemer!
Wednesday, September 11, 13
Prefiks-problemer
*suffix xiffus*
(60.6384, 6.5017) u4u8gyykk
123 {1-hundreds, 12-tens, 123} (forenkla)
Wednesday, September 1...
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Elasticsearch
Distribuert
Cluster av noder
Selv-koordinerende
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Mapping
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
!
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Så langt
Inverterte indekser
Tekstprosessering
Indeks-termer
Mappings
Indeks-maler
Wednesday, September 11, 13
Wednesday, September 11, 13
xkcd.com/208
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
  ?q={!boost b=div(popularity,price) v=$qq}
        &qq={!dismax qf=desc^2,review}cheap
        &bq={!lucene df=keywords}l...
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Filtre
Caches som bitmaps
Kompakte
Veldig raske
Wednesday, September 11, 13
term:
className: "InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState"
Wednesday,...
Wednesday, September 11, 13
Wednesday, September 11, 13
Filtre
Bruk filtre når du kan …
… og queries når du trenger rangering.
Wednesday, September 11, 13
Fasetter
Oppsummerer hele resultat-mengden
Filtre + fasetter grunnlag for analyse-bruk
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Wednesday, September 11, 13
Fasetterings-muligheter
Termer
Histogrammer
Tids-histogrammer
Geo-distanse
Statistisk fordeling
Filtre/Spørringer
Wednesda...
Fasetter
Ressurskrevende
CPU + minne
Viktig å ha nok minne
Wednesday, September 11, 13
Filter-cacher
Felt-cacher: fasetter, m.m.
Page-cache
Cacher
There are two hard things in computer science:
cache invalidat...
Cacher
Now you are thinking with...
Per segment
Nye segmenter invaliderer
ikke gamle
Viktig for (near) real time
Wednesday...
Wednesday, September 11, 13
PostgreSQL
Verifiserer ressursbruk
Trygg >> rask
Bruker disk om den må
Wednesday, September 11, 13
Elasticsearch stoler på deg
Bygd for fart
What could possibly go wrong?
Wednesday, September 11, 13
OutOfMemoryError
Woah there
I ate all the memories
Your cluster may or may not work any more
Wednesday, September 11, 13
NOSQL?
Kjapp, ikke robust
Dokumentdatabase
Skjema-fleksibel
Ingen transaksjoner
Lett å skalere/distribuere
Naïv leader-elec...
?
Slides og relevante linker på
found.no/jz13
(Prøv hosted Elasticsearch i 6 mnd. gratis)
Solr-meetup i community-rommet
i...
Image credits
Nails – Adam Rosenberg
Map of Westeros
Elephant, Roy Costello
Wingsuit, Richard Schneider
Wednesday, Septemb...
Upcoming SlideShare
Loading in...5
×

Elasticsearch – mye mer enn søk! [JavaZone 2013]

396

Published on

Søkemotorer kan løse langt fler utfordringer enn en søkeboks gir. Du har kanskje et søkeproblem uten å være klar over det?

Elasticsearch, en open source søkemotor bygd på Lucene, får stadig mer oppmerksomhet - ikke bare fordi den er glimrende til å løse typiske søkeproblemer, men også fordi den kan brukes til analyse- og "big data"-utfordringer.

Foredraget gir en oversikt over hva søkemotorer er gode på, relaterte problemer du kommer over, hvordan Elasticsearch kan bidra – samt hvordan den passer inn i teknologistacken din.

Det er ingen tutorial, men med et relativt høyt tempo og eksempler med realistisk kompleksitet gis en oversikt over hva som er mulig.

Vi runder av med hvordan Elasticsearch kan klassifiseres i mylderet av "NoSQL"-databaser.

Published in: Technology, Design
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
396
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Elasticsearch – mye mer enn søk! [JavaZone 2013]"

  1. 1. Elasticsearch Mye mer enn søk! Alex Brasetvik alex@found.no @alexbrasetvik Wednesday, September 11, 13
  2. 2. Hvem? Co-founder av Found AS 7+ år søk, 2+ Elasticsearch Håndterer hundrevis av Elasticsearch-clustre Wednesday, September 11, 13
  3. 3. Agenda 0. Elasticsearch 1. Bruksområder 2. Lingo 3. Datastrukturer 4. Tekstprosessering 5. Elasticsearch 6. NOSQL? Wednesday, September 11, 13
  4. 4. Elasticsearch Open source Real-time søk og analyse Skjemafri Basert på Lucene Wednesday, September 11, 13
  5. 5. Wednesday, September 11, 13
  6. 6. $ curl localhost:9200/sample_index/sample_type -XPOST -d ' { "user": { "name": "DEVOPS_BORAT" }, "followers": 42000, "location": { "lat": 56.78, "lon": 12.34 }, "tags": [ "questionable", "funny" ], "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123 } ' {"ok":true,"_index":"sample_index","_type":"sample_message", "_id":"rjs9KSmPRnqhvs7QjgxJJw","_version":1} Wednesday, September 11, 13
  7. 7. $ curl localhost:9200/sample_index/sample_type/_search -XPOST -d ' { "query":{ "match": { "message": "consistent" } } } ' Wednesday, September 11, 13
  8. 8. { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.076713204, "hits" : [ { "_index" : "sample_index", "_type" : "sample_message", "_id" : "rjs9KSmPRnqhvs7QjgxJJw", "_score" : 0.076713204, "_source" : { "user": { "name": "DEVOPS_BORAT" }, "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123, ... } } ] } } Wednesday, September 11, 13
  9. 9. { "sample_index" : { "sample_message" : { "properties" : { "followers" : { "type" : "long" }, "location" : { "properties" : { "lat" : { "type" : "double" }, "lon" : { "type" : "double" } } }, "message" : { "type" : "string" }, "retweets" : { "type" : "long" }, "tags" : { "type" : "string" }, "user" : { "properties" : { "name" : { "type" : "string" } } } } }Wednesday, September 11, 13
  10. 10. Wednesday, September 11, 13
  11. 11. Wednesday, September 11, 13
  12. 12. Wednesday, September 11, 13
  13. 13. Wednesday, September 11, 13
  14. 14. {"id"=>12296272736, "text"=> "An early look at Annotations: http://groups.google.com/group/twitter-api-announce/browse_thread/thread/fa5da2608865453", "created_at"=>"Fri Apr 16 17:55:46 +0000 2010", "in_reply_to_user_id"=>nil, "in_reply_to_screen_name"=>nil, "in_reply_to_status_id"=>nil "favorited"=>false, "truncated"=>false, "user"=> {"id"=>6253282, "screen_name"=>"twitterapi", "name"=>"Twitter API", "description"=> "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.", "url"=>"http://apiwiki.twitter.com", "location"=>"San Francisco, CA", "profile_background_color"=>"c1dfee", "profile_background_image_url"=> "http://a3.twimg.com/profile_background_images/59931895/twitterapi-background-new.png", "profile_background_tile"=>false, "profile_image_url"=>"http://a3.twimg.com/profile_images/689684365/api_normal.png", "profile_link_color"=>"0000ff", "profile_sidebar_border_color"=>"87bc44", "profile_sidebar_fill_color"=>"e0ff92", "profile_text_color"=>"000000", "created_at"=>"Wed May 23 06:01:13 +0000 2007", "contributors_enabled"=>true, "favourites_count"=>1, "statuses_count"=>1628, "friends_count"=>13, "time_zone"=>"Pacific Time (US & Canada)", "utc_offset"=>-28800, "lang"=>"en", "protected"=>false, "followers_count"=>100581, "geo_enabled"=>true, "notifications"=>false, "following"=>true, "verified"=>true}, "contributors"=>[3191321], "geo"=>nil, "coordinates"=>nil, "place"=> {"id"=>"2b6ff8c22edd9576", "url"=>"http://api.twitter.com/1/geo/id/2b6ff8c22edd9576.json", "name"=>"SoMa", "full_name"=>"SoMa, San Francisco", "place_type"=>"neighborhood", "country_code"=>"US", "country"=>"The United States of America", "bounding_box"=> {"coordinates"=> [[[-122.42284884, 37.76893497], [-122.3964, 37.76893497], [-122.3964, 37.78752897], [-122.42284884, 37.78752897]]], "type"=>"Polygon"}}, "source"=>"web"} The tweet's unique ID. These IDs are roughly sorted & developers should treat them as opaque (http://bit.ly/dCkppc). Text of the tweet. Consecutive duplicate tweets are rejected. 140 character max (http://bit.ly/4ud3he). Tweet's creation date. DEPRECATED The ID of an existing tweet that this tweet is in reply to. Won't be set unless the author of the referenced tweet is mentioned. The screen name & user ID of replied to tweet author. Truncated to 140 characters. Only possible from SMS. Theauthorofthetweet.This embeddedobjectcangetoutofsync. Theauthor's userID. The author's user name. The author's screen name. The author's biography. The author's URL. The author's "location". This is a free-form text field, and there are no guarantees on whether it can be geocoded. Rendering information for the author. Colors are encoded in hex values (RGB). The creation date for this account. Whether this account has contributors enabled (http://bit.ly/50npuu). Number of favorites this user has. Numberoftweets thisuserhas. Number of users this user is following.The timezone and offset (in seconds) for this user. The user's selected language. Whether this user is protected or not. If the user is protected, then this tweet is not visible except to "friends". Number of followers for this user. Whetherthisuserhasgeo enabled(http://bit.ly/4pFY77). DEPRECATED in this context Whether this user has a verified badge. Thegeotagonthistweetin GeoJSON(http://bit.ly/b8L1Cp). The contributors' (if any) user IDs (http://bit.ly/50npuu). DEPRECATED The place associated with this Tweet (http://bit.ly/b8L1Cp). The place ID The URL to fetch a detailed polygon for this placeThe printable names of this place The type of this place - can be a "neighborhood" or "city" The country this place is in The bounding box for this place The application that sent this tweet Map of a Twitter Status Object Raffi Krikorian <raffi@twitter.com> 18 April 2010 Wednesday, September 11, 13
  15. 15. Wednesday, September 11, 13
  16. 16. Wednesday, September 11, 13
  17. 17. Wednesday, September 11, 13
  18. 18. Wednesday, September 11, 13
  19. 19. Wednesday, September 11, 13
  20. 20. Wednesday, September 11, 13
  21. 21. user: name: DEVOPS_BORAT message: “1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.” location: lon: 12.34 lat: 56.78 followers: 42000 retweets: 123 tags: [questionable, funny] Wednesday, September 11, 13
  22. 22. The quick brown fox had a day off whitespace-tokenizer Wednesday, September 11, 13
  23. 23. Filter: boolean match Query: match med score Kan være satt sammen av andre queries Filter / Query Wednesday, September 11, 13
  24. 24. “Søk” Hele informasjonsbehovet Query, filtre, fasetter, paginering, ... Wednesday, September 11, 13
  25. 25. Invertert indeks "If you don't find it in the index, look very carefully through the entire catalog." –Sears, Roebuck, and Co., Consumers' Guide 1897 Wednesday, September 11, 13
  26. 26. Wednesday, September 11, 13
  27. 27. AbstractEnterpriseSingletonProxyFactoryBean Wednesday, September 11, 13
  28. 28. xkcd.com/292 Wednesday, September 11, 13
  29. 29. AbstractSingletonProxyFactoryBean camelCase-tokenizer lowercase Wednesday, September 11, 13
  30. 30. Prefiks-problemer! Wednesday, September 11, 13
  31. 31. Prefiks-problemer *suffix xiffus* (60.6384, 6.5017) u4u8gyykk 123 {1-hundreds, 12-tens, 123} (forenkla) Wednesday, September 11, 13
  32. 32. Wednesday, September 11, 13
  33. 33. Wednesday, September 11, 13
  34. 34. Wednesday, September 11, 13
  35. 35. Wednesday, September 11, 13
  36. 36. Elasticsearch Distribuert Cluster av noder Selv-koordinerende Wednesday, September 11, 13
  37. 37. Wednesday, September 11, 13
  38. 38. Wednesday, September 11, 13
  39. 39. Wednesday, September 11, 13
  40. 40. Wednesday, September 11, 13
  41. 41. Wednesday, September 11, 13
  42. 42. Wednesday, September 11, 13
  43. 43. Wednesday, September 11, 13
  44. 44. Wednesday, September 11, 13
  45. 45. Mapping Wednesday, September 11, 13
  46. 46. Wednesday, September 11, 13
  47. 47. Wednesday, September 11, 13
  48. 48. Wednesday, September 11, 13
  49. 49. ! Wednesday, September 11, 13
  50. 50. Wednesday, September 11, 13
  51. 51. Wednesday, September 11, 13
  52. 52. Wednesday, September 11, 13
  53. 53. Wednesday, September 11, 13
  54. 54. Wednesday, September 11, 13
  55. 55. Wednesday, September 11, 13
  56. 56. Så langt Inverterte indekser Tekstprosessering Indeks-termer Mappings Indeks-maler Wednesday, September 11, 13
  57. 57. Wednesday, September 11, 13
  58. 58. xkcd.com/208 Wednesday, September 11, 13
  59. 59. Wednesday, September 11, 13
  60. 60. Wednesday, September 11, 13
  61. 61.   ?q={!boost b=div(popularity,price) v=$qq}         &qq={!dismax qf=desc^2,review}cheap         &bq={!lucene df=keywords}lucene solr java         &fq={!geofilt sfield=location pt=10.312,-20.556 d=3.5}         &fq={!term f=$ff v=$vv}&ff=keywords&vv=solr         &sort=query(keywords:lame) asc, score desc Wednesday, September 11, 13
  62. 62. Wednesday, September 11, 13
  63. 63. Wednesday, September 11, 13
  64. 64. Wednesday, September 11, 13
  65. 65. Wednesday, September 11, 13
  66. 66. Wednesday, September 11, 13
  67. 67. Wednesday, September 11, 13
  68. 68. Wednesday, September 11, 13
  69. 69. Wednesday, September 11, 13
  70. 70. Wednesday, September 11, 13
  71. 71. Wednesday, September 11, 13
  72. 72. Wednesday, September 11, 13
  73. 73. Filtre Caches som bitmaps Kompakte Veldig raske Wednesday, September 11, 13
  74. 74. term: className: "InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState" Wednesday, September 11, 13
  75. 75. Wednesday, September 11, 13
  76. 76. Wednesday, September 11, 13
  77. 77. Filtre Bruk filtre når du kan … … og queries når du trenger rangering. Wednesday, September 11, 13
  78. 78. Fasetter Oppsummerer hele resultat-mengden Filtre + fasetter grunnlag for analyse-bruk Wednesday, September 11, 13
  79. 79. Wednesday, September 11, 13
  80. 80. Wednesday, September 11, 13
  81. 81. Wednesday, September 11, 13
  82. 82. Wednesday, September 11, 13
  83. 83. Fasetterings-muligheter Termer Histogrammer Tids-histogrammer Geo-distanse Statistisk fordeling Filtre/Spørringer Wednesday, September 11, 13
  84. 84. Fasetter Ressurskrevende CPU + minne Viktig å ha nok minne Wednesday, September 11, 13
  85. 85. Filter-cacher Felt-cacher: fasetter, m.m. Page-cache Cacher There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. Wednesday, September 11, 13
  86. 86. Cacher Now you are thinking with... Per segment Nye segmenter invaliderer ikke gamle Viktig for (near) real time Wednesday, September 11, 13
  87. 87. Wednesday, September 11, 13
  88. 88. PostgreSQL Verifiserer ressursbruk Trygg >> rask Bruker disk om den må Wednesday, September 11, 13
  89. 89. Elasticsearch stoler på deg Bygd for fart What could possibly go wrong? Wednesday, September 11, 13
  90. 90. OutOfMemoryError Woah there I ate all the memories Your cluster may or may not work any more Wednesday, September 11, 13
  91. 91. NOSQL? Kjapp, ikke robust Dokumentdatabase Skjema-fleksibel Ingen transaksjoner Lett å skalere/distribuere Naïv leader-election Ingen auth/authz Wednesday, September 11, 13
  92. 92. ? Slides og relevante linker på found.no/jz13 (Prøv hosted Elasticsearch i 6 mnd. gratis) Solr-meetup i community-rommet i morgen! Wednesday, September 11, 13
  93. 93. Image credits Nails – Adam Rosenberg Map of Westeros Elephant, Roy Costello Wingsuit, Richard Schneider Wednesday, September 11, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×