• Like
  • Save
99 Problems, But The Search Ain't One
Upcoming SlideShare
Loading in...5
×
 

99 Problems, But The Search Ain't One

on

  • 35,705 views

ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful ...

ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful search engine, ready to be plugged into Web applications.

Statistics

Views

Total Views
35,705
Views on SlideShare
19,051
Embed Views
16,654

Actions

Likes
29
Downloads
36
Comments
2

34 Embeds 16,654

http://www.webdistortion.com 10206
http://www.elasticsearch.org 4579
http://blog.webdistortion.com 1581
http://lanyrd.com 147
http://webcache.googleusercontent.com 29
http://translate.googleusercontent.com 27
http://devvideos.com 13
http://www.elasticsearch.cn 10
http://staging.elasticsearch.org 7
http://es-cn.medcl.net 7
url_unknown 7
http://www.vxseo.com 6
http://www.linkedin.com 6
http://www.google.nl 5
http://www.slideshare.net 2
http://www.devslides.com 2
http://devslides.com 2
http://srukwebdev01.totemic.local 2
http://es-en.medcl.net 1
http://www.solostream.com 1
https://www.google.fr 1
http://s.medcl.net 1
http://webdesign.feedfury.com 1
http://twitter.com 1
http://localhost:4000 1
http://static.slidesharecdn.com 1
http://us-w1.rockmelt.com 1
https://twitter.com 1
http://elasticsearch.karmi.cz 1
http://radbox.me 1
http://ktunnel.com 1
http://vxseo.com 1
http://www.onlydoo.com 1
https://translate.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • It's marvelous!
    Are you sure you want to
    Your message goes here
    Processing…
  • very good
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    99 Problems, But The Search Ain't One 99 Problems, But The Search Ain't One Presentation Transcript

    • 99 Problems, ButThe Search Ain’t OneAndrei Zmievski • PHP UK •!Feb 25, 2011
    • who am I? curl http://localhost:9200/speaker/info/andrei{“name”: “Andrei Zmievski”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
    • what is elasticsearch?a search engine for the NoSQL generation domain-driven distributed RESTful Hitchhiker’s Guide to the Galaxy (no, really)
    • document modeldocument-orientedJSON-basedschema-free
    • enginebased on Lucenemulti-tenancydistributed, out of the box
    • nomenclatureindextypedocument _idnode
    • 3 easy steps
    • 1. index !"#$%&()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<= >request %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N= >response %%%%?1:?/-#"9 %%%%?OB7<9P?/?!178? %%%%?O-I.9?/?3.92:9#? %%%%?OB<?/?;? N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE total number of hits %%?,B-3?%/%> !!!!"#$#%&"!!()response %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;E the index of the docresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> !!!!!!"*+,-./"!!"0$,1") %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> the type of the doc %%%%%%?OB7<9P?%/%?!178?E !!!!!!"*#23."!!"43.%5.6") %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") the id of the doc %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") %%%%%%?O3!1#9?%/%6UV46LM64E the hit score %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%?-11:?%/%TE %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E !!!!!!"*+-"!!"7") %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% the original source 8 !!!!",%9."!":,-6.+!;9+.<45+") !!!!"#%&5"!"==!>6$?&.94)!?@#!#A.!B.%60A!:+,C#!D,.") !!!!"&+5.4"!E"0$-+,F")!"?..6")!"3A$#$F6%3A2"G) !!!!"#H+##.6"!"%") !!!!"A.+FA#"!(IJ K%N%J%N%N
    • 2. searchrequest !"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99# >%"#$$5"!!L) %%?O3,2#<3?%/%> %%%%?-1-2$?%/%;E the execution time %%%%?3"!!9338"$?%/%;E %%%%?82B$9<?%/%6 %%NE %%?,B-3?%/%> %%%%?-1-2$?%/%;Eresponse %%%%?@2PO3!1#9?%/%6UV46LM64E %%%%?,B-3?%/%G%> %%%%%%?OB7<9P?%/%?!178?E %%%%%%?O-I.9?%/%?3.92:9#?E %%%%%%?OB<?%/%?5?E %%%%%%?O3!1#9?%/%6UV46LM64E %%%%%%?O31"#!9?%/% > %%%%?72@9?/%?A7<#9B%C@B9D3:B?E %%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E %%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE %%%%?-KB--9#?/%?2?E %%%%?,9BH,-?/%;LM N%N%J%N%N
    • 3. profitthat’s up to you
    • demo
    • distributed modelprovides: performance resiliency (high-availability)
    • shardsa portion of the document spaceeach one is a separate Lucene index thus, many per-index settings are availabledocument is sharded by its _id value but can be assigned (routed) to a shard deterministically
    • zero-conf discoveryzen (multicast and unicast)cloud (EC2 via API)
    • auto-routingmaster node: maintains cluster state reassigns shards if nodes leave/join clusterany node can serve as the request routerthe query is handled via scatter-gather mechanism
    • replicaseach shard can have 1 or more replicas# of replicas can be updated dynamically afterindex creationreplicas can be used for querying in parallel
    • shard allocation node 1 start with a single node
    • shard allocation node 1 person1 person2 PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }}
    • shard allocation node 1 node 2 person1 person1 person2 person2 start the second node
    • shard allocationnode 1 node 2 node 3 node 4person1 person1person2 person2 start 2 more nodes
    • shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 start 2 more nodes
    • document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 PUT /person/info/1 {…}
    • document sharding node 1 node 2 node 3 node 4 person1 person1 person2 person2 PUT /person/info/1hashed to shard 1 {…}
    • document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 replicated PUT /person/info/1 {…}
    • document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 PUT /person/info/2 {…}
    • document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2hashed to shard 2 PUT /person/info/2 {…}
    • document shardingnode 1 node 2 node 3 node 4person1 person1 person2 person2 replicated PUT /person/info/2 {…}
    • scatter-gathernode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
    • shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
    • shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
    • shard allocationnode 1 node 2 node 3 node 4person1 person1 person2 person2 GET /person/_search?q=name:thomas
    • transactional modelper-document consistencyno need to commit/flushuses write-behind transaction logwrite consistency (W) can be controlled one, quorum, or all
    • (near) real-time search1 second refresh rate by default_refresh API also
    • index storagenode data considered transientcan be stored in local file system, JVM heap,native OS memory, or FS & memory combinationpersistent storage requires a gateway
    • gatewayspersistent store for cluster state and indicesasynchronous, translog-based write strategyallows full recovery if a cluster restart is neededsupported gateways: local shared FS Hadoop via HDFS S3
    • mappingdescribes document structure to the searchengineautomatically created with sensible defaultsexplicit mapping can be provided (generally, agood idea)can run into merge conflicts
    • mappingimportant meta fields: _source _all _boost
    • mapping typessimple: string, integer/long, float/double, boolean, and null)complex: array, object
    • sample mappingdocument >?"39#?/%%%%%%?<9#B!:?E %?-B-$9?/%%%%%?W17X-%(27B!?E %?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE %?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E %?.#B1#B-I?/%%5N >?.13-?/%> %%?.#1.9#-B93?%/%>mapping %%%%?"39#?/%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE %%%%?@9332H9?/%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE %%%%?-2H3?/%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE %%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE %%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N NNN
    • analyzersbreak down (tokenize) and normalize fields duringindexing and query strings at search timeanalyzer = tokenizer + token filters (0 or more)*-27<2#<%A72$IZ9#%S%%%*-27<2#<%+1:97BZ9#%]%%%%%%%*-27<2#<%+1:97%^B$-9#%]%%%%%%%_1K9#!239%+1:97%^B$-9#%]%%%%%%%*-1.%+1:97%^B$-9#
    • analyzers analyzers, tokenizers, and filters can be customizedmapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
    • API
    • API conventionsappend ?pretty=true to get readable JSONboolean values: false/0/off = false, rest is trueJSONP support via callback parameter
    • API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
    • API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
    • _cluster API structureGET /_cluster/healthGET /_cluster/health/index1,index2GET /_cluster/nodes/statsGET /_cluster/nodes/nodeId1,nodeId2/stats
    • API {core}index searchbulk querydelete from/size pagingdelete by query sortget highlightingcount selective fields
    • API {indices}create optimizedelete snapshotopen/close update settingsget/put/delete analyzemapping statusrefresh flush
    • API {cluster}healthstatenodes infonodes statsnodes shutdown
    • Query DSLterm / terms query_stringrange default_operatorprefix analyzerbool phrase_slopfuzzy etcwildcard
    • filtersshare some similar features with queries (term,range, etc)why use a filter?
    • filtersfaster than queriescached (depends on the filter) the cache is used for different queries against the same filterno scoringmore useful ones: term, terms, range, prefix, and,or, not, exists, missing, query
    • facetsprovide aggregated data based on the searchrequestterms, histogram, date histogram, range,statistical, and more
    • geo searchimplemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
    • interfacesREST including memcachedJava /!GroovyLanguage clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, PerlFlume sink implementation
    • elasticasimilar to the other PHP ElasticSearch clientAPI naming is consistent with Zend Frameworkcan be extended for new filters, facets, etcstill under development
    • elastica $es = new Elastica_Client(vm, 9200); $index = new Elastica_Index($es, test); $index->create(array(), true); $type = new Elastica_Type($index, person); $doc = new Elastica_Document(1, array(name => Andrei Zmievski,example email => andrei@test.com, username => andrei, bills => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString(andrei); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
    • data importES is not the primary data store (usually)to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
    • 10 more featuresversioning load balancing nodesindex aliases pluginsparent/child docs more_like_thisscripting multi_field mappingdynamic mapping percolationtemplates
    • Referenceshttp://github.com/elasticsearch/elasticsearchhttp://www.elasticsearch.org/community/forumIRC: #elasticsearch on irc.freenode.nettwitter: @elasticsearch HTTP://ZMIEVSKI.ORG/TALKS