quick intro to elastic search


Published on

quick intro to elastic search

Published in: Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

quick intro to elastic search

  1. 1. ElasticSearch<br />Introduction and quick startup<br />medcl 9-29<br />
  2. 2. introduction<br />ElasticSearch,a distributed search solution ,<br />domain driven<br />schema free<br />anything pluggable<br />open source, distributed, RESTful<br />Author:shay.banon (expert in search and analytics)<br />Compass<br />GigaSpaces<br />Current Version 0.11.0<br />
  3. 3. Features<br />Reliable, Asynchronous Write Behind for long term persistency.<br />(Near) Real Time Search.<br />Built on top of Lucene.<br />shard is a fully functional Lucene index.<br />All the power of Lucene easily exposed through simple configuration / plugins.<br />Per operation consistency<br />Single document level operations are atomic, consistent, isolated and durable.<br />Open Source under Apache 2 License.<br />
  4. 4. Distributed and Highly Available<br />Each index is fully sharded with a configurable number of shards.<br />Each shard can have zero or more replicas.<br />Read / Search operations performed on either replica shard.<br />
  5. 5. Multi Tenant with Multi Types.<br />Support for more than one index.<br />Support for more than one type per index.<br />Index level configuration (number of shards, index storage, ...).<br />
  6. 6. Document oriented<br />No need for upfront schema definition.<br />Schema can be defined per type for customization of the indexing process.<br />
  7. 7. Various set of APIs.<br />HTTP RESTful API.<br />Native Java API.<br />3rd Clients<br />perl、python、php、ruby、groovy、erlang、.NET<br />All APIs perform automatic node operation rerouting.<br />
  8. 8. Up and run<br />
  9. 9. install<br />Zero Conf<br />
  10. 10. index<br />$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon" }'$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{     "user": "kimchy",     "post_date": "2009-11-15T13:12:00",     "message": "Trying out Elastic Search, so far so good?" }'$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{     "user": "kimchy",     "post_date": "2009-11-15T14:12:12",     "message": "You know, for Search" }'<br />
  11. 11. Schema mapping<br />$ curl -XPUT http://localhost:9200/twitter$ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{    "properties" : {        "name" : { "type" : "string" }    }}'<br />
  12. 12. GET<br />$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/2<br />
  13. 13. Search<br />$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } } }'$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{ "query" : {         "range" : {             "post_date" : {                 "from" : "2009-11-15T13:00:00",                 "to" : "2009-11-15T14:30:00"             }         } } }'<br />
  14. 14. multenancy<br />$ curl -XPUT http://localhost:9200/kimchy$ curl -XPUT http://localhost:9200/elasticsearch$ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Zug Zug", "tag": "warcraft" }'$ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Whatyouwant?", "tag": "warcraft" }'$ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft$ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft<br />
  15. 15. Setting<br />$ curl -XPUT http://localhost:9200/kimchy/ -d 'index :    store:        type: memory'$ curl -XPUT http://localhost:9200/elasticsearch/ -d ' {    "index" : {        "number_of_shards" : 2,        "number_of_replicas" : 3    }}'<br />
  16. 16. Behind ElasticSearch<br />
  17. 17. Modules<br />
  18. 18. Zen Discovery<br />Zen is used for both discovery and master election. A master in elasticsearch is responsible for handling nodes coming and going and allocation of shards. Note, the master is not a single point of failure, if it fails, then another node will be elected as master.<br /> that nodes do not need to communicate with the master on each request, so its not a single point of bottleneck<br />The readiness of nodes is done using the shard allocation algorithm. A shard allocated to a node is considered “ready” to receive requests only once it has fully initialized.<br />
  19. 19. scalability<br /> nodes that can hold data, and nodes that do not. <br />There is no need for a load balancer in elasticsearch, each node can receive a request, and if it can’t handle it, it will automatically delegate it to the appropriate node(s). <br />If you want to scale out search, you can simply have more shard replicas per shard.<br />
  20. 20. automatic shard allocation<br />From:http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010#<br />
  21. 21. BASE support<br />Each document you index is there once the index operation is done. <br />No need to commit or something similar to get everything persisted. <br />A shard can have 1 or more replicas for HA. <br />Gateway persistency is done in the background in an async manner.<br />
  22. 22. The River<br />A river is a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster.<br />
  23. 23. Geo Location and Search<br />1. make your data geo enabled<br />{    "pin" : {        "location" : {            "lat" : 40.12,            "lon" : -71.34        },        "tag" : ["food", "family"],        "text" : "my favorite family restaurant"    }}<br />Find By Location<br />Sorting<br />Faceting … …<br />
  24. 24. More details in http://www.elasticsearch.com/docs/<br />
  25. 25. comparison<br />
  26. 26. Compare with solr<br />Though support dynamic schema,but it sucks<br /> *i ,name_i,age_i,…. <br />Distribute ,just do many replica,Master-Slave,and with a dirty query like this:<br />http://localhost:9080/solr/select/?q=xxx:xxx&shards=localhost:8080/solr,localhost:9080/solr WTF!<br />Does it really RESTful?anyway, doesn’t matter<br />
  27. 27. Compare with katta<br />Featrures<br />Makes serving large or high load indices easy<br />Serves very large Lucene or HadoopMapfile indices as index shards on many servers<br />Replicate shards on different servers for performance and fault-tolerance<br />Supports pluggable network topologies<br />Master fail-over<br />Fast, lightweight, easy to integrate<br />Plays well with Hadoop clusters<br />May heavy to us(may be not)<br />Master-Node,complex and ops will killed us?can’t be a little easy?<br />Lack of Client and documents<br />Inactivity Community<br />Lake of Some Search Features <br />
  28. 28. Resources<br />
  29. 29. Link:<br />http://www.elasticsearch.com<br />http://www.elasticsearch.com/blog<br />http://www.elasticsearch.com/docs/<br />http://www.elasticsearch.com/community/mailinglist/user/<br />http://github.com/elasticsearch<br />References:<br />http://highscalability.com/blog/2010/2/10/elasticsearch-open-source-distributed-restful-search-engine.html<br />http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/<br />http://mail-archives.apache.org/mod_mbox/hbase-user/201006.mbox/%3C149150.78881.qm@web50304.mail.re2.yahoo.com%3E<br />http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010#<br />
  30. 30. Thanks/<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.