Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discovering ElasticSearch

7,674 views

Published on

My introduction to ElasticSearch at Laracon EU 2014, where I explain the ins and outs of ElasticSearch.

The talk is centred around a single example; objective, that is modelled both in ElasticSearch and plain SQL. I discuss the advantages of both as well as integration with Laravel 4.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Discovering ElasticSearch

  1. 1. Discovering+Elas/cSearch
  2. 2. A"li%le"about"me
  3. 3. 23#years#old
  4. 4. Jess'%boyfriend
  5. 5. Zoe's&dad
  6. 6. Mountain(biker
  7. 7. Scuba&diver
  8. 8. Scuba&biker?
  9. 9. Mar$al&Arts&Instructor
  10. 10. I'm$also$a$developer. I"run"a"development"agency"in"Australia," Webcomm
  11. 11. Finally,(I'm(a(Laracon'addict who$has$travelled 132,000&km a"third"of"the"distance"to"the"moon.
  12. 12. Flashcard • @ben_corle+ • h+p://github.com/bencorle+ • h+p://webcomm.com.au
  13. 13. Search'is filtering!informa)on and$determining$relevance
  14. 14. Picture(this(scenario
  15. 15. "I#want#to#find#hotels#called# Renaissance#for#under+€150,#within+ 500m+of+Bimhuis#so#I#am#close#to# Laracon#EU#2014.#The#hotel#needs#to# have#disability+access#and#ideally# provide#Wifi#and#have#rooms#above+ ground+level."
  16. 16. Requirements 1. Name'of'hotel'called'Renaissance. 2. Under'€150.'1 3. Within'500m'of'Bimhuis.'1 4. Disability'access. 1"Perceived"requirements"may"be"flexible.
  17. 17. Wants 1. Wifi.&1 2. Rooms&above&ground&level. 1"Let's"be"honest,"this"should"be"a"requirement";)
  18. 18. If!search!is!filtering!informa1on!and! determining!relevance And!humans!think!with!expression!and! emo2on,!why!do!your!apps!operate!like! robots? How!can!we!tailor!our!apps!to!think!like!our! users?
  19. 19. Use$the$right$toolset;$and Know%your%data.
  20. 20. Introducing+Elas0cSearch
  21. 21. Elas%cSearch 1. Powerful+search+and+analy3cs+engine.+1 2. Object;based+document+store+where+every+field+is+indexed+and+ searchable. 3. Distributed;+ready+to+scale.+2 1"Elas'cSearch"may"be"used"for"much"more"than"just"a"search"engine. 2"Webscalez"FTW"(trollolol).
  22. 22. SQL$vs.$Elas+cSearch
  23. 23. SQL$and$Elas+cSearch;$a$comparison • SQL%is%a%rela,onal%database,%Elas,cSearch%is%a%search%engine. • Where%SQL%is%great%at%filtering%on%a%binary%level,%Elas,cSearch% thrives%on%both%binary%data%and%full%text%relevance. • SQL%indexes%are%always%up%to%date%with%your%primary%store,% Elas,cSearch%needs%to%be%synced. • Elas,cSearch%is%very%easy%to%horizontally%scale%for%performance% and%redundancy.
  24. 24. SQL$and$Elas+cSearch;$a$comparison SQL$uses$the$following$structure$for$its$data$store: database > table > row Elas%cSearch+has+a+different,+yet+comparable+structure: index > type > document
  25. 25. Elas%cSearch+101
  26. 26. It's%all%about%documents 1. Elas'cSearch-is-document5oriented. 2. Documents-are-represented-using-JSON. 3. Data-can-be-in-nested-JSON-objects,-arrays-and-is-all-searchable.
  27. 27. It's%all%about%documents { "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ] }
  28. 28. Elas%cSearch+Requires Java You$have$3$seconds$to$sulk$and$complain,$then$shutup.
  29. 29. Installa'on)is)easy 1. Install)Java. 2. Download)and)run)Elas4cSearch)in)3)bash)commands.)1 3. Debian)or)RPM)packages)available. 4. Puppet)&)chef)scripts)available. 1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/ current/_installa.on.html
  30. 30. Scaling(isn't(scary 1. Near'zero*configura0on*to*build*a*cluster*of*Elas0cSearch* instances. 2. Easy*to*scale*horizontally. 3. Each*Elas0cSearch*instance*is*referred*to*as*a*node. 4. Any*node*is*capable*of*handling*any*request*and*delega0ng*load.
  31. 31. In#very#basic#terms,#horizontal*scaling#is#adding more%servers to#build#a#cluster,#or#cloud...
  32. 32. ...where&ver$cal(scaling&is&throwing more%resources at#an#individual#server.
  33. 33. Elas%cSearch+exposes+a RESTful(API
  34. 34. Win.
  35. 35. Communica)ng+with+Elas)cSearch 1. HTTP&verbs&,&GET,&POST,&PUT,&DELETE,&etc... 2. Send&&&receive&JSON&payloads. :VERB /:index/:type/:document { "key": "value", "complex": ["foo", "bar"] }
  36. 36. Crea%ng(a(document POST /myapp/hotel { "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 6, "features": [ "disability_access", "wifi", "smoking_allowed", "pool" ] }
  37. 37. Upda%ng(a(document PUT /myapp/hotel/1 # ^1 { "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ] } 1"Actually"an"upsert;"create"or"update"depending"on"existance.
  38. 38. Dele$ng'a'document DELETE /myapp/hotel/1
  39. 39. There's'lots'more'to'documents 1. Par&al(document(upda&ng. 2. Document(versioning. 3. Conflict(resolu&on(for(distributed(documents. 4. Bulk(CRUD(methods(to(avoid(HTTP(boEleneck. See#h%p://www.elas.csearch.org/guide/en/elas.csearch/reference/ current/docs.html
  40. 40. Elas%cSearch+makes searching*fun
  41. 41. Searching*in*Elas.cSearch 1. Every(single(field(can(be(searchable. 2. Perform(structured(queries(or(filters,(against(fields.1 3. Perform(full(text(queries(to(find(documents. 4. Queries(and(filters(represented(using(JSON. 5. Organise(results(by(relevance. 1"SQL&like"approach.
  42. 42. Index 1. Index&(noun)#$#refers#to#the#equivalent#of#a#database#in#an#SQL# system. 2. Index&(verb)#$#refers#to#the#process#of#storing(a(document#in#an# index. 3. Inverted&index#$#list#of#all#terms#inside#Elas@cSearch#and#the# documents#in#which#they#appear.
  43. 43. Analysis • Character(filters"simplify"data,"such"as"changing: • "&""to""and". • "é""to""e". • Data"is"split"into"terms"through"a"process"called"tokenisa1on.
  44. 44. Analysis • Token&filters"tweak"and"normalise"terms,"such"as: • Cast"to"lowercase. • Remove&stop3words"like""a""or""the" • Add"synonyms.
  45. 45. Inverted(index 1. Analysis#process#extremely#configurable.1 2. Mul7lingual#support#(33#languages#in#total),#interchangeable#per# index. 3. Any#fields#not#indexed#are#not#searchable. 4. The+same+analysis+process+occurs+at+search+3me. 1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/ current/analysis.html
  46. 46. Example(inverted(index Consider)the)two)following)sentences: "The%quick%brown%fox%jumped%over%the%lazy%dog" "Quick%brown%foxes%leap%over%lazy%dogs%in%summer"
  47. 47. Example(inverted(index Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X the | X | X ------------------------
  48. 48. Scoring 1. Term%frequency#$#the#more#o+en#a#term#appears#in#a#field,#the# more%relevant. 2. Inverted%document%frequency#$#the#more#o+en#a#term#appears# in#the#inverted#index,#the#less%relevant. 3. Field%length%norm#$#the#longer#the#field,#the#less#relevant#each# term#in#it#is.
  49. 49. Scoring 1. Fields)are)"boostable")to)increase)relevance. 2. Func5ons)(inbuilt)and)scripted))can)be)used)to)increase/decrease) relevance. 3. Altering)analysis)to)fine?tune)scoring. 4. Very)important)to)know%your%data.
  50. 50. Queries'and'filters 1. Both'are'modular;'think'of'building(blocks. 2. Both'can'be'nested'inside'one'another. 3. Syntax'does'not'change,'regardless'of'posi?on'or'nes?ng. 4. En?re'JSON'object'is'the'Elas/cSearch(Query(DSL.
  51. 51. Querying)in)Elas.cSearch 1. There'37'queries'(as'of'August'2014).'1 2. Queries'are'intelligent;'they'score'all'results'according'to'a' relevance'algorithm. 3. Any'nesBng'passes'relevance'back'to'parents. 1"h$p://www.elas.csearch.org/guide/en/elas.csearch/guide/current/ relevance9intro.html
  52. 52. Filters 1. You&will&find&27&filters&(as&of&August&2014).&1 2. Filters&are&binary;&either&a&field&matches&or&it&doesn't. 3. Filters&don't&affect&relevance&scoring. 1"h$p://www.elas.csearch.org/guide/en/elas.csearch/reference/ current/query;dsl;filters.html
  53. 53. Querying)in)Elas.cSearch GET /myapp/hotel/_search { "query": { "match": "Renaissance" } } This%is%a%match%query.%It%is%the%go1to%full%text%query.
  54. 54. Querying)in)Elas.cSearch GET /myapp/hotel/_search { "query": { "filtered": { "filter": { "term": { "features": "disability_access" } } } } } This%is%a%filtered%query,%containing%a%term%filter.
  55. 55. Back%to%our%scenario
  56. 56. "I#want#to#find#hotels#called# Renaissance#for#under+€150,#within+ 500m+of+Bimhuis#so#I#am#close#to# Laracon#EU#2014.#The#hotel#needs#to# have#disability+access#and#ideally# provide#Wifi#and#have#rooms#above+ ground+level."
  57. 57. Two$approaches 1. Possible*with*both*SQL*and*Elas5cSearch. 2. Much*easier*Elas5cSearch. 3. Elas5cSearch*understands*the*concept*of*relevance. 4. Elas5cSearch*can*severely*outperform*SQL.
  58. 58. SQL$approach
  59. 59. First,'we'll'build'the'obvious... select * from `hotels` where `name` like "%Renaissance%" and `price` <= 150 and `disability_access` = 1
  60. 60. Performance*&*relevance Consider)the)following: select * from `hotels` where `name` like "%Renaissance%" 1. This'query'will'be'slow. 2. This'query'accounts'for'terms'which'contain'the'(correctly)'spelt' Renaissance.
  61. 61. Full$text$search 1. Add%a%full%text%index.%1 2. Alter%the%query,%and%search%across%both%name%and%company. select * from `hotels` where match (`name`, `company`) against ("Renaissance") and `price` <= 150 and `disability_access` = 1 1"h$p://dev.mysql.com/doc/refman/5.0/en/fulltext<search.html
  62. 62. Checklist 1. Name'of'hotel'called'Renaissance 2. Under'€150. 3.Within&500m&of&Bimhuis. 4. Disability'access. 5.Wifi. 6. Above&ground&level.
  63. 63. Adding&"wants"&in select *, if (`floor_levels` > 1, 1, 0) as `has_multiple_floor_levels` from `hotels` where match (`name`, `company`) against ("Renaissance") and `price` <= 150 and `disability_access` = 1 order by `wifi` desc, `has_multiple_floor_levels` desc -- ^1 1"We're"priori*sing"Wifi"over"mul*ple"floor"levels...
  64. 64. Checklist 1. ~Name(of(hotel(called(Renaissance. 2. Under(€150. 3.Within&500m&of&Bimhuis. 4. Disability(access. 5. Wifi. 6. Rooms(above(ground(level.
  65. 65. Spacial'awareness... 1. Not&so&easy. 2. PostGIS&for&PostgreSQL.&1 3. Possible&with&MySQL&with&MyISAM&tables&only.&2 4. Very&finite;&either&a&match&or&not&a&match. 5. Outside&the&scope&of&this&talk. 1"h$p://postgis.net 2"Possible"with"other"engines"in"new"versions"of"MySQL
  66. 66. Checklist 1. ~Name(of(hotel(called(Renaissance. 2. Under(€150. 3. Within(500m(of(Bimhuis. 4. Disability(access. 5. Wifi. 6. Rooms(above(ground(level.
  67. 67. What%if... 1. Somebody*searched*for*"Residence*Inn"*as*the*hotel*name? 2. There*was*an*appropriate*hotel*for*€151? 3. A*brilliant*candidate*could*be*found*501m*away*from*Bimhuis? 4. Somebody*cared*more*haveing*rooms*above*ground*level*than* being*provided*Wifi?
  68. 68. Elas%cSearch+approach
  69. 69. Popula'ng*Elas'cSearch POST /myapp/hotel { "name": "Renaissance Hotel Amsterdam", "company": "Marriott", "location": [52.3712561, 4.9005577], "floor_levels": 10, "features": [ "disability_access", "wifi", "pool", "restaurant" ] } Rince&and&repeat&for&as&many&hotels&as&required
  70. 70. The$bool$query { "bool": { "must": {}, "must_not": {}, "should": {} } } We#specify#condi-ons#which#must#and#must%not#match.#Terms#that# should#match#make#a#document#more#relevant.
  71. 71. Prepare&a&bool&query { "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name^2", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } } } A"field"boost"of"2"was"applied"to"name"to"increase"relevance.
  72. 72. Checklist 1. ~Name(of(hotel(called(Renaissance. 2. Under&€150. 3.Within&500m&of&Bimhuis. 4. Disability(access. 5. Wifi. 6. Have(rooms(above(ground(level.
  73. 73. What%if... 1. There'was'an'appropriate'hotel'for'€151? 2. A'brilliant'candidate'could'be'found'501m'away'from'Bimhuis?
  74. 74. This%is%all%possible%with%Elas/cSearch, plus%it's%easy.
  75. 75. Controlling)relevance { "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } } } Set$the$origin$to$Bimhuis,$allowing$loca4ons$ of$hotels$within$500m.$Outside$that,$a$steep$ decay$of$relevance$occurs.
  76. 76. Controlling)relevance { "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } } } Any$price$over$€100$suffers$a$similar,$severe$ relevance$penalty.
  77. 77. { "query": { "function_score": { "query": { "bool": { "must": { "multi_match": {"query": "Renaissance", "fields": ["name", "company"]}, "term": {"features": "disability_access"}, }, "should": { "term": {"features": "wifi"}, "range": {"floor_levels": {"gt": 1}} } } }, "functions": [ { "gauss": { "location": { "origin": "52.3712561,4.9005577", "offset": "0.5km", "decay": 20 } } }, { "gauss": { "price": { "origin": 0, "offset": 100, "decay": 20 } } } ] } } }
  78. 78. See#example#in#detail#over#at h"p://git.io/V4Hm6w
  79. 79. Integra(ng)with WordPress *Joking*
  80. 80. Integra(ng)with Laravel
  81. 81. Install'via'Composer { "require": { "elasticsearch/elasticsearch": "1.1.*" } }
  82. 82. Crea%ng(/(upda%ng(documents $client = new ElasticsearchClient(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'name' => 'Renaissance Hotel Amsterdam', 'company' => 'Marriott', 'location' => [52.3712561, 4.9005577], 'floor_levels' => 10, 'features' => ['disability_access', 'wifi', 'pool', 'restaurant'], ], ]); You$can$create$or$update$in$the$same$request.
  83. 83. Par$ally'upda$ng'documents $client = new ElasticsearchClient(); $client->update([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', 'body' => [ 'floor_levels' => 11, ], ]);
  84. 84. Dele$ng'documents $client = new ElasticsearchClient(); $client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => '1', ]);
  85. 85. Searching*documents $client = new ElasticsearchClient(); $client->search([ 'index' => 'myapp', 'type' => 'hotel', 'body' => [ 'query' => [ 'match' => 'Renaissance', ], ], ]);
  86. 86. Create/update/delete+eloquent+documents Hotel::created(function ($hotel) { $client = new ElasticsearchClient(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]); });
  87. 87. Create/update/delete+eloquent+documents Hotel::updated(function ($hotel) { $client = new ElasticsearchClient(); $client->index([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, 'body' => $hotel->toArray(), ]); });
  88. 88. Create/update/delete+eloquent+documents Hotel::deleted(function ($hotel) { $client = new ElasticsearchClient(); $client->delete([ 'index' => 'myapp', 'type' => 'hotel', 'id' => $hotel->id, ]); });
  89. 89. Searching*+*two*approaches 1. Acceptable#have#a#search()#method#directly#on#Eloquent#to# use#Elas6cSearch 2. Be*er#approach#is#to#decorate#a#repository;#decouple#and# remove#vendor#lock<in.
  90. 90. Eloquent)repository class EloquentHotelRepository implements HotelRepository { public function create($name) { // Create and save the model } public function search($name, array $filters) { // Perform search as best you can without ElasticSearch... } // Truncated for brevity... }
  91. 91. Decora'ng*the*repository class ElasticSearchHotelRepository implements HotelRepository { protected $eloquent; public function __construct(EloquentHotelRepository $eloquent) { $this->eloquent = $eloquent; } public function create($name) { $this->eloquent->create($name); } public function search($name, array $filters) { // Truncated for brevity... } }
  92. 92. Decora'ng*the*repository class ElasticSearchHotelRepository implements HotelRepository { public function search($name, array $filters) { $results = $client->search([ // ... ]); return array_map(function ($result) { $hotel = new Hotel([ // ]); $hotel->exists = true; return $hotel; }, $results); } }
  93. 93. To#dig#deeper,#please#visit h"p://git.io/CRW6Mg01 1"Repository"will"be"live"soon
  94. 94. Why$I$chose$Elas-cSearch 1. Built(for(real.me(search(applica.ons. 2. Handles(concurrent(read/write(much(be;er(than(compe.tors. 3. Download(an(execute(a(single(binary(as(a(bare(minimum. 4. Easy(to(configure;(you(don't(need(to(configure(anything. 5. JSON(over(a(RESTful(API.(Need(I(say(more?
  95. 95. Why$I$chose$Elas-cSearch$over$"X" 1. Solr#$#I#dislike#how#you#communicate#with#it;#maybe#I'm#not# enterprise+enough#for#XML.#I#also#don't#like#it's#real>me# performance.1 2. Sphinx#$#query#language#was#peculiar,#SQL$like.#Was#never#built# as#a#real>me#search#engine. 1"h$p://blog.socialcast.com/real5me6search6solr6vs6elas5csearch/
  96. 96. Things'I'haven't'told'you'about 1. Par&al(matching(0(matching(par&al(words(using(ngrams. 2. How(easy(and(fast(autocomplete(can(be. 3. Fuzzy1search(0(misspelt(words. 4. Fine0tuning(analysis(for(specific(data(sets. 5. Analy5cs(0(aggrega&ng(sta&s&cs(to(produce(things(like(reports( or(faceted8filtering(0(part(of(a(query.
  97. 97. One$more$thing...
  98. 98. Elas%cSearch+is+coming+soon+to Laravel'Homestead Run$vagrant box update$to$get$the$awesomeness.
  99. 99. Further'learning 1. h$p://www.elas-csearch.org 2. h$p://shop.oreilly.com/product/0636920028505.do 3. h$p://git.io/CRW6Mg
  100. 100. h"p://joind.in/11691 h"ps://github.com/bencorle"/laracon5eu52014

×