99 Problems, But The Search Ain't One

40,381 views

Published on

ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful search engine, ready to be plugged into Web applications.

Published in: Technology
2 Comments
32 Likes
Statistics
Notes
No Downloads
Views
Total views
40,381
On SlideShare
0
From Embeds
0
Number of Embeds
20,061
Actions
Shares
0
Downloads
57
Comments
2
Likes
32
Embeds 0
No embeds

No notes for slide

99 Problems, But The Search Ain't One

  1. 1. 99 Problems, ButThe Search Ain’t OneAndrei Zmievski • PHP UK •!Feb 25, 2011
  2. 2. who am I? curl http://localhost:9200/speaker/info/andrei{“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
  3. 3. what is elasticsearch?a search engine for the NoSQL generation domain-driven distributed RESTful Hitchhiker’s Guide to the Galaxy (no, really)
  4. 4. document modeldocument-orientedJSON-basedschema-free
  5. 5. enginebased on Lucenemulti-tenancydistributed, out of the box
  6. 6. nomenclatureindextypedocument _idnode
  7. 7. 3 easy steps
  8. 8. 1. index !"#$%&()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<= >request %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N= >response %%%%?1:?/-#"9 %%%%?OB7<9P?/?!178? %%%%?O-I.9?/?3.92:9#? %%%%?OB<?/?;? N
  9. 9. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  10. 10. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE total number of hits %%?,B-3?%/%> !!!!"#$#%&"!!()response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  11. 11. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E the index of the docresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> !!!!!!"*+,-./"!!"0$,1") %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  12. 12. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> the type of the doc %%%%%%?OB7<9P?%/%?!178?E !!!!!!"*#23."!!"43.%5.6") %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  13. 13. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  14. 14. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  15. 15. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") %%%%%%?O3!1#9?%/%6UV46LM64E the hit score %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  16. 16. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% the original source 8 !!!!",%9."!":,-6.+!;9+.<45+") !!!!"#%&5"!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.") !!!!"&+5.4"!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G) !!!!"#H+##.6"!"%") !!!!"A.+FA#"!(IJ K%N%J%N%N
  17. 17. 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%"#$$5"!!L) %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E the execution time %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
  18. 18. 3. profitthat’s up to you
  19. 19. demo
  20. 20. distributed modelprovides: performance resiliency (high-availability)
  21. 21. shardsa portion of the document spaceeach one is a separate Lucene index thus, many per-index settings are availabledocument is sharded by its _id value but can be assigned (routed) to a shard deterministically
  22. 22. zero-conf discoveryzen (multicast and unicast)cloud (EC2 via API)
  23. 23. auto-routingmaster node: maintains cluster state reassigns shards if nodes leave/join clusterany node can serve as the request routerthe query is handled via scatter-gather mechanism
  24. 24. replicaseach shard can have 1 or more replicas# of replicas can be updated dynamically afterindex creationreplicas can be used for querying in parallel
  25. 25. shard allocation node 1 start with a single node
  26. 26. shard allocation node 1 person1 person2 PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }}
  27. 27. shard allocation node 1 node 2 person1 person1 person2 person2 start the second node
  28. 28. shard allocationnode 1 node 2 node 3 node 4person1 person1person2 person2 start 2 more nodes
  29. 29. shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 start 2 more nodes
  30. 30. document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 PUT /person/info/1 {…}
  31. 31. document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1hashed to shard 1 {…}
  32. 32. document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 replicated PUT /person/info/1 {…}
  33. 33. document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 PUT /person/info/2 {…}
  34. 34. document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2hashed to shard 2 PUT /person/info/2 {…}
  35. 35. document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 replicated PUT /person/info/2 {…}
  36. 36. scatter-gathernode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
  37. 37. shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
  38. 38. shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
  39. 39. shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
  40. 40. transactional modelper-document consistencyno need to commit/flushuses write-behind transaction logwrite consistency (W) can be controlled one, quorum, or all
  41. 41. (near) real-time search1 second refresh rate by default_refresh API also
  42. 42. index storagenode data considered transientcan be stored in local file system, JVM heap,native OS memory, or FS & memory combinationpersistent storage requires a gateway
  43. 43. gatewayspersistent store for cluster state and indicesasynchronous, translog-based write strategyallows full recovery if a cluster restart is neededsupported gateways: local shared FS Hadoop via HDFS S3
  44. 44. mappingdescribes document structure to the searchengineautomatically created with sensible defaultsexplicit mapping can be provided (generally, agood idea)can run into merge conflicts
  45. 45. mappingimportant meta fields: _source _all _boost
  46. 46. mapping typessimple: string, integer/long, float/double, boolean, and null)complex: array, object
  47. 47. sample mappingdocument >?"39#?/%%%%%%?<9#B!:?E %?-B-$9?/%%%%%?W17X-%(27B!?E %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E %?.#B1#B-I?/%%5N >?.13-?/%> %%?.#1.9#-B93?%/%>mapping %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N NNN
  48. 48. analyzersbreak down (tokenize) and normalize fields duringindexing and query strings at search timeanalyzer = tokenizer + token filters (0 or more)*-27<2#<%A72$IZ9#%S%%%*-27<2#<%+1:97BZ9#%]%%%%%%%*-27<2#<%+1:97%^B$-9#%]%%%%%%%_1K9#!239%+1:97%^B$-9#%]%%%%%%%*-1.%+1:97%^B$-9#
  49. 49. analyzers analyzers, tokenizers, and filters can be customizedmapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
  50. 50. API
  51. 51. API conventionsappend ?pretty=true to get readable JSONboolean values: false/0/off = false, rest is trueJSONP support via callback parameter
  52. 52. API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
  53. 53. API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
  54. 54. _cluster API structureGET /_cluster/healthGET /_cluster/health/index1,index2GET /_cluster/nodes/statsGET /_cluster/nodes/nodeId1,nodeId2/stats
  55. 55. API {core}index searchbulk querydelete from/size pagingdelete by query sortget highlightingcount selective fields
  56. 56. API {indices}create optimizedelete snapshotopen/close update settingsget/put/delete analyzemapping statusrefresh flush
  57. 57. API {cluster}healthstatenodes infonodes statsnodes shutdown
  58. 58. Query DSLterm / terms query_stringrange default_operatorprefix analyzerbool phrase_slopfuzzy etcwildcard
  59. 59. filtersshare some similar features with queries (term,range, etc)why use a filter?
  60. 60. filtersfaster than queriescached (depends on the filter) the cache is used for different queries against the same filterno scoringmore useful ones: term, terms, range, prefix, and,or, not, exists, missing, query
  61. 61. facetsprovide aggregated data based on the searchrequestterms, histogram, date histogram, range,statistical, and more
  62. 62. geo searchimplemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
  63. 63. interfacesREST including memcachedJava /!GroovyLanguage clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, PerlFlume sink implementation
  64. 64. elasticasimilar to the other PHP ElasticSearch clientAPI naming is consistent with Zend Frameworkcan be extended for new filters, facets, etcstill under development
  65. 65. elastica $es = new Elastica_Client(vm, 9200); $index = new Elastica_Index($es, test); $index->create(array(), true); $type = new Elastica_Type($index, person); $doc = new Elastica_Document(1, array(name => Andrei Zmievski,example email => andrei@test.com, username => andrei, bills => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString(andrei); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
  66. 66. data importES is not the primary data store (usually)to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
  67. 67. 10 more featuresversioning load balancing nodesindex aliases pluginsparent/child docs more_like_thisscripting multi_field mappingdynamic mapping percolationtemplates
  68. 68. Referenceshttp://github.com/elasticsearch/elasticsearchhttp://www.elasticsearch.org/community/forumIRC: #elasticsearch on irc.freenode.nettwitter: @elasticsearch HTTP://ZMIEVSKI.ORG/TALKS

×