Elasticsearch – mye mer enn søk! [JavaZone 2013]
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Elasticsearch – mye mer enn søk! [JavaZone 2013]

  • 555 views
Uploaded on

Søkemotorer kan løse langt fler utfordringer enn en søkeboks gir. Du har kanskje et søkeproblem uten å være klar over det? ...

Søkemotorer kan løse langt fler utfordringer enn en søkeboks gir. Du har kanskje et søkeproblem uten å være klar over det?

Elasticsearch, en open source søkemotor bygd på Lucene, får stadig mer oppmerksomhet - ikke bare fordi den er glimrende til å løse typiske søkeproblemer, men også fordi den kan brukes til analyse- og "big data"-utfordringer.

Foredraget gir en oversikt over hva søkemotorer er gode på, relaterte problemer du kommer over, hvordan Elasticsearch kan bidra – samt hvordan den passer inn i teknologistacken din.

Det er ingen tutorial, men med et relativt høyt tempo og eksempler med realistisk kompleksitet gis en oversikt over hva som er mulig.

Vi runder av med hvordan Elasticsearch kan klassifiseres i mylderet av "NoSQL"-databaser.

More in: Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
555
On Slideshare
555
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Elasticsearch Mye mer enn søk! Alex Brasetvik alex@found.no @alexbrasetvik Wednesday, September 11, 13
  • 2. Hvem? Co-founder av Found AS 7+ år søk, 2+ Elasticsearch Håndterer hundrevis av Elasticsearch-clustre Wednesday, September 11, 13
  • 3. Agenda 0. Elasticsearch 1. Bruksområder 2. Lingo 3. Datastrukturer 4. Tekstprosessering 5. Elasticsearch 6. NOSQL? Wednesday, September 11, 13
  • 4. Elasticsearch Open source Real-time søk og analyse Skjemafri Basert på Lucene Wednesday, September 11, 13
  • 5. Wednesday, September 11, 13
  • 6. $ curl localhost:9200/sample_index/sample_type -XPOST -d ' { "user": { "name": "DEVOPS_BORAT" }, "followers": 42000, "location": { "lat": 56.78, "lon": 12.34 }, "tags": [ "questionable", "funny" ], "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123 } ' {"ok":true,"_index":"sample_index","_type":"sample_message", "_id":"rjs9KSmPRnqhvs7QjgxJJw","_version":1} Wednesday, September 11, 13
  • 7. $ curl localhost:9200/sample_index/sample_type/_search -XPOST -d ' { "query":{ "match": { "message": "consistent" } } } ' Wednesday, September 11, 13
  • 8. { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.076713204, "hits" : [ { "_index" : "sample_index", "_type" : "sample_message", "_id" : "rjs9KSmPRnqhvs7QjgxJJw", "_score" : 0.076713204, "_source" : { "user": { "name": "DEVOPS_BORAT" }, "message": "1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.", "retweets": 123, ... } } ] } } Wednesday, September 11, 13
  • 9. { "sample_index" : { "sample_message" : { "properties" : { "followers" : { "type" : "long" }, "location" : { "properties" : { "lat" : { "type" : "double" }, "lon" : { "type" : "double" } } }, "message" : { "type" : "string" }, "retweets" : { "type" : "long" }, "tags" : { "type" : "string" }, "user" : { "properties" : { "name" : { "type" : "string" } } } } }Wednesday, September 11, 13
  • 10. Wednesday, September 11, 13
  • 11. Wednesday, September 11, 13
  • 12. Wednesday, September 11, 13
  • 13. Wednesday, September 11, 13
  • 14. {"id"=>12296272736, "text"=> "An early look at Annotations: http://groups.google.com/group/twitter-api-announce/browse_thread/thread/fa5da2608865453", "created_at"=>"Fri Apr 16 17:55:46 +0000 2010", "in_reply_to_user_id"=>nil, "in_reply_to_screen_name"=>nil, "in_reply_to_status_id"=>nil "favorited"=>false, "truncated"=>false, "user"=> {"id"=>6253282, "screen_name"=>"twitterapi", "name"=>"Twitter API", "description"=> "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.", "url"=>"http://apiwiki.twitter.com", "location"=>"San Francisco, CA", "profile_background_color"=>"c1dfee", "profile_background_image_url"=> "http://a3.twimg.com/profile_background_images/59931895/twitterapi-background-new.png", "profile_background_tile"=>false, "profile_image_url"=>"http://a3.twimg.com/profile_images/689684365/api_normal.png", "profile_link_color"=>"0000ff", "profile_sidebar_border_color"=>"87bc44", "profile_sidebar_fill_color"=>"e0ff92", "profile_text_color"=>"000000", "created_at"=>"Wed May 23 06:01:13 +0000 2007", "contributors_enabled"=>true, "favourites_count"=>1, "statuses_count"=>1628, "friends_count"=>13, "time_zone"=>"Pacific Time (US & Canada)", "utc_offset"=>-28800, "lang"=>"en", "protected"=>false, "followers_count"=>100581, "geo_enabled"=>true, "notifications"=>false, "following"=>true, "verified"=>true}, "contributors"=>[3191321], "geo"=>nil, "coordinates"=>nil, "place"=> {"id"=>"2b6ff8c22edd9576", "url"=>"http://api.twitter.com/1/geo/id/2b6ff8c22edd9576.json", "name"=>"SoMa", "full_name"=>"SoMa, San Francisco", "place_type"=>"neighborhood", "country_code"=>"US", "country"=>"The United States of America", "bounding_box"=> {"coordinates"=> [[[-122.42284884, 37.76893497], [-122.3964, 37.76893497], [-122.3964, 37.78752897], [-122.42284884, 37.78752897]]], "type"=>"Polygon"}}, "source"=>"web"} The tweet's unique ID. These IDs are roughly sorted & developers should treat them as opaque (http://bit.ly/dCkppc). Text of the tweet. Consecutive duplicate tweets are rejected. 140 character max (http://bit.ly/4ud3he). Tweet's creation date. DEPRECATED The ID of an existing tweet that this tweet is in reply to. Won't be set unless the author of the referenced tweet is mentioned. The screen name & user ID of replied to tweet author. Truncated to 140 characters. Only possible from SMS. Theauthorofthetweet.This embeddedobjectcangetoutofsync. Theauthor's userID. The author's user name. The author's screen name. The author's biography. The author's URL. The author's "location". This is a free-form text field, and there are no guarantees on whether it can be geocoded. Rendering information for the author. Colors are encoded in hex values (RGB). The creation date for this account. Whether this account has contributors enabled (http://bit.ly/50npuu). Number of favorites this user has. Numberoftweets thisuserhas. Number of users this user is following.The timezone and offset (in seconds) for this user. The user's selected language. Whether this user is protected or not. If the user is protected, then this tweet is not visible except to "friends". Number of followers for this user. Whetherthisuserhasgeo enabled(http://bit.ly/4pFY77). DEPRECATED in this context Whether this user has a verified badge. Thegeotagonthistweetin GeoJSON(http://bit.ly/b8L1Cp). The contributors' (if any) user IDs (http://bit.ly/50npuu). DEPRECATED The place associated with this Tweet (http://bit.ly/b8L1Cp). The place ID The URL to fetch a detailed polygon for this placeThe printable names of this place The type of this place - can be a "neighborhood" or "city" The country this place is in The bounding box for this place The application that sent this tweet Map of a Twitter Status Object Raffi Krikorian <raffi@twitter.com> 18 April 2010 Wednesday, September 11, 13
  • 15. Wednesday, September 11, 13
  • 16. Wednesday, September 11, 13
  • 17. Wednesday, September 11, 13
  • 18. Wednesday, September 11, 13
  • 19. Wednesday, September 11, 13
  • 20. Wednesday, September 11, 13
  • 21. user: name: DEVOPS_BORAT message: “1+1=2 only in legacy system. In modern distributed database with eventual consistent is 1+1=1.” location: lon: 12.34 lat: 56.78 followers: 42000 retweets: 123 tags: [questionable, funny] Wednesday, September 11, 13
  • 22. The quick brown fox had a day off whitespace-tokenizer Wednesday, September 11, 13
  • 23. Filter: boolean match Query: match med score Kan være satt sammen av andre queries Filter / Query Wednesday, September 11, 13
  • 24. “Søk” Hele informasjonsbehovet Query, filtre, fasetter, paginering, ... Wednesday, September 11, 13
  • 25. Invertert indeks "If you don't find it in the index, look very carefully through the entire catalog." –Sears, Roebuck, and Co., Consumers' Guide 1897 Wednesday, September 11, 13
  • 26. Wednesday, September 11, 13
  • 27. AbstractEnterpriseSingletonProxyFactoryBean Wednesday, September 11, 13
  • 28. xkcd.com/292 Wednesday, September 11, 13
  • 29. AbstractSingletonProxyFactoryBean camelCase-tokenizer lowercase Wednesday, September 11, 13
  • 30. Prefiks-problemer! Wednesday, September 11, 13
  • 31. Prefiks-problemer *suffix xiffus* (60.6384, 6.5017) u4u8gyykk 123 {1-hundreds, 12-tens, 123} (forenkla) Wednesday, September 11, 13
  • 32. Wednesday, September 11, 13
  • 33. Wednesday, September 11, 13
  • 34. Wednesday, September 11, 13
  • 35. Wednesday, September 11, 13
  • 36. Elasticsearch Distribuert Cluster av noder Selv-koordinerende Wednesday, September 11, 13
  • 37. Wednesday, September 11, 13
  • 38. Wednesday, September 11, 13
  • 39. Wednesday, September 11, 13
  • 40. Wednesday, September 11, 13
  • 41. Wednesday, September 11, 13
  • 42. Wednesday, September 11, 13
  • 43. Wednesday, September 11, 13
  • 44. Wednesday, September 11, 13
  • 45. Mapping Wednesday, September 11, 13
  • 46. Wednesday, September 11, 13
  • 47. Wednesday, September 11, 13
  • 48. Wednesday, September 11, 13
  • 49. ! Wednesday, September 11, 13
  • 50. Wednesday, September 11, 13
  • 51. Wednesday, September 11, 13
  • 52. Wednesday, September 11, 13
  • 53. Wednesday, September 11, 13
  • 54. Wednesday, September 11, 13
  • 55. Wednesday, September 11, 13
  • 56. Så langt Inverterte indekser Tekstprosessering Indeks-termer Mappings Indeks-maler Wednesday, September 11, 13
  • 57. Wednesday, September 11, 13
  • 58. xkcd.com/208 Wednesday, September 11, 13
  • 59. Wednesday, September 11, 13
  • 60. Wednesday, September 11, 13
  • 61.   ?q={!boost b=div(popularity,price) v=$qq}         &qq={!dismax qf=desc^2,review}cheap         &bq={!lucene df=keywords}lucene solr java         &fq={!geofilt sfield=location pt=10.312,-20.556 d=3.5}         &fq={!term f=$ff v=$vv}&ff=keywords&vv=solr         &sort=query(keywords:lame) asc, score desc Wednesday, September 11, 13
  • 62. Wednesday, September 11, 13
  • 63. Wednesday, September 11, 13
  • 64. Wednesday, September 11, 13
  • 65. Wednesday, September 11, 13
  • 66. Wednesday, September 11, 13
  • 67. Wednesday, September 11, 13
  • 68. Wednesday, September 11, 13
  • 69. Wednesday, September 11, 13
  • 70. Wednesday, September 11, 13
  • 71. Wednesday, September 11, 13
  • 72. Wednesday, September 11, 13
  • 73. Filtre Caches som bitmaps Kompakte Veldig raske Wednesday, September 11, 13
  • 74. term: className: "InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState" Wednesday, September 11, 13
  • 75. Wednesday, September 11, 13
  • 76. Wednesday, September 11, 13
  • 77. Filtre Bruk filtre når du kan … … og queries når du trenger rangering. Wednesday, September 11, 13
  • 78. Fasetter Oppsummerer hele resultat-mengden Filtre + fasetter grunnlag for analyse-bruk Wednesday, September 11, 13
  • 79. Wednesday, September 11, 13
  • 80. Wednesday, September 11, 13
  • 81. Wednesday, September 11, 13
  • 82. Wednesday, September 11, 13
  • 83. Fasetterings-muligheter Termer Histogrammer Tids-histogrammer Geo-distanse Statistisk fordeling Filtre/Spørringer Wednesday, September 11, 13
  • 84. Fasetter Ressurskrevende CPU + minne Viktig å ha nok minne Wednesday, September 11, 13
  • 85. Filter-cacher Felt-cacher: fasetter, m.m. Page-cache Cacher There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. Wednesday, September 11, 13
  • 86. Cacher Now you are thinking with... Per segment Nye segmenter invaliderer ikke gamle Viktig for (near) real time Wednesday, September 11, 13
  • 87. Wednesday, September 11, 13
  • 88. PostgreSQL Verifiserer ressursbruk Trygg >> rask Bruker disk om den må Wednesday, September 11, 13
  • 89. Elasticsearch stoler på deg Bygd for fart What could possibly go wrong? Wednesday, September 11, 13
  • 90. OutOfMemoryError Woah there I ate all the memories Your cluster may or may not work any more Wednesday, September 11, 13
  • 91. NOSQL? Kjapp, ikke robust Dokumentdatabase Skjema-fleksibel Ingen transaksjoner Lett å skalere/distribuere Naïv leader-election Ingen auth/authz Wednesday, September 11, 13
  • 92. ? Slides og relevante linker på found.no/jz13 (Prøv hosted Elasticsearch i 6 mnd. gratis) Solr-meetup i community-rommet i morgen! Wednesday, September 11, 13
  • 93. Image credits Nails – Adam Rosenberg Map of Westeros Elephant, Roy Costello Wingsuit, Richard Schneider Wednesday, September 11, 13