ElasticSearchIntroduction and quick startupmedcl 9-29
introductionElasticSearch,a distributed search solution ,domain drivenschema  freeanything pluggableopen source, distributed, RESTfulAuthor:shay.banon (expert in search and analytics)CompassGigaSpacesCurrent Version 0.11.0
FeaturesReliable, Asynchronous Write Behind for long term persistency.(Near) Real Time Search.Built on top of Lucene.shard is a fully functional Lucene index.All the power of Lucene easily exposed through simple configuration / plugins.Per operation consistencySingle document level operations are atomic, consistent, isolated and durable.Open Source under Apache 2 License.
Distributed and Highly AvailableEach index is fully sharded with a configurable number of shards.Each shard can have zero or more replicas.Read / Search operations performed on either replica shard.
Multi Tenant with Multi Types.Support for more than one index.Support for more than one type per index.Index level configuration (number of shards, index storage, ...).
Document orientedNo need for upfront schema definition.Schema can be defined per type for customization of the indexing process.
Various set of APIs.HTTP RESTful API.Native Java API.3rd Clientsperl、python、php、ruby、groovy、erlang、.NETAll APIs perform automatic node operation rerouting.
Up and run
installZero Conf
index$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon" }'$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{     "user": "kimchy",     "post_date": "2009-11-15T13:12:00",     "message": "Trying out Elastic Search, so far so good?" }'$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{     "user": "kimchy",     "post_date": "2009-11-15T14:12:12",     "message": "You know, for Search" }'
Schema mapping$ curl -XPUT http://localhost:9200/twitter$ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{    "properties" : {        "name" : { "type" : "string" }    }}'
GET$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/2
Search$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } } }'$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{ "query" : {         "range" : {             "post_date" : {                 "from" : "2009-11-15T13:00:00",                 "to" : "2009-11-15T14:30:00"             }         } } }'
multenancy$ curl -XPUT http://localhost:9200/kimchy$ curl -XPUT http://localhost:9200/elasticsearch$ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d \'{ "post_date": "2009-11-15T14:12:12", "message": "Zug Zug", "tag": "warcraft" }'$ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d \'{ "post_date": "2009-11-15T14:12:12", "message": "Whatyouwant?", "tag": "warcraft" }'$ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft$ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft
Setting$ curl -XPUT http://localhost:9200/kimchy/ -d \'index :    store:        type: memory'$ curl -XPUT http://localhost:9200/elasticsearch/ -d \' {    "index" : {        "number_of_shards" : 2,        "number_of_replicas" : 3    }}'
Behind ElasticSearch
Modules
Zen DiscoveryZen is used for both discovery and master election. A master in elasticsearch is responsible for handling nodes coming and going and allocation of shards. Note, the master is not a single point of failure, if it fails, then another node will be elected as master. that nodes do not need to communicate with the master on each request, so its not a single point of bottleneckThe readiness of nodes is done using the shard allocation algorithm. A shard allocated to a node is considered “ready” to receive requests only once it has fully initialized.
scalability nodes that can hold data, and nodes that do not. There is no need for a load balancer in elasticsearch, each node can receive a request, and if it can’t handle it, it will automatically delegate it to the appropriate node(s). If you want to scale out search, you can simply have more shard replicas per shard.
automatic shard allocationFrom:http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010#
BASE supportEach document you index is there once the index operation is done. No need to commit or something similar to get everything persisted. A shard can have 1 or more replicas for HA. Gateway persistency is done in the background in an async manner.
The RiverA river is a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster.
Geo Location and Search1. make your data geo enabled{    "pin" : {        "location" : {            "lat" : 40.12,            "lon" : -71.34        },        "tag" : ["food", "family"],        "text" : "my favorite family restaurant"    }}Find By LocationSortingFaceting … …
More details in http://www.elasticsearch.com/docs/
comparison
Compare with solrThough support dynamic schema,but it sucks *i ,name_i,age_i,…. Distribute ,just do many replica,Master-Slave,and with a dirty query like this:http://localhost:9080/solr/select/?q=xxx:xxx&shards=localhost:8080/solr,localhost:9080/solr     WTF!Does it really RESTful?anyway, doesn’t matter
Compare with kattaFeatruresMakes serving large or high load indices easyServes very large Lucene or HadoopMapfile indices as index shards on many serversReplicate shards on different servers for performance and fault-toleranceSupports pluggable network topologiesMaster fail-overFast, lightweight, easy to integratePlays well with Hadoop clustersMay heavy to us(may be not)Master-Node,complex and ops will killed us?can’t be a little easy?Lack of Client and documentsInactivity CommunityLake of Some Search Features
Resources
Link:http://www.elasticsearch.comhttp://www.elasticsearch.com/bloghttp://www.elasticsearch.com/docs/http://www.elasticsearch.com/community/mailinglist/user/http://github.com/elasticsearchReferences:http://highscalability.com/blog/2010/2/10/elasticsearch-open-source-distributed-restful-search-engine.htmlhttp://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/http://mail-archives.apache.org/mod_mbox/hbase-user/201006.mbox/%3C149150.78881.qm@web50304.mail.re2.yahoo.com%3Ehttp://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010#

quick intro to elastic search

  • 1.
  • 2.
    introductionElasticSearch,a distributed searchsolution ,domain drivenschema freeanything pluggableopen source, distributed, RESTfulAuthor:shay.banon (expert in search and analytics)CompassGigaSpacesCurrent Version 0.11.0
  • 3.
    FeaturesReliable, Asynchronous WriteBehind for long term persistency.(Near) Real Time Search.Built on top of Lucene.shard is a fully functional Lucene index.All the power of Lucene easily exposed through simple configuration / plugins.Per operation consistencySingle document level operations are atomic, consistent, isolated and durable.Open Source under Apache 2 License.
  • 4.
    Distributed and HighlyAvailableEach index is fully sharded with a configurable number of shards.Each shard can have zero or more replicas.Read / Search operations performed on either replica shard.
  • 5.
    Multi Tenant withMulti Types.Support for more than one index.Support for more than one type per index.Index level configuration (number of shards, index storage, ...).
  • 6.
    Document orientedNo needfor upfront schema definition.Schema can be defined per type for customization of the indexing process.
  • 7.
    Various set ofAPIs.HTTP RESTful API.Native Java API.3rd Clientsperl、python、php、ruby、groovy、erlang、.NETAll APIs perform automatic node operation rerouting.
  • 8.
  • 9.
  • 10.
    index$ curl -XPUThttp://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon" }'$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{     "user": "kimchy",     "post_date": "2009-11-15T13:12:00",     "message": "Trying out Elastic Search, so far so good?" }'$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{     "user": "kimchy",     "post_date": "2009-11-15T14:12:12",     "message": "You know, for Search" }'
  • 11.
    Schema mapping$ curl-XPUT http://localhost:9200/twitter$ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{    "properties" : {        "name" : { "type" : "string" }    }}'
  • 12.
    GET$ curl -XPUThttp://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/2
  • 13.
    Search$ curl -XPUThttp://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "postDate": "2009-11-15T14:12:12", "message": "You know, for Search" }'$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } } }'$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{ "query" : {         "range" : {             "post_date" : {                 "from" : "2009-11-15T13:00:00",                 "to" : "2009-11-15T14:30:00"             }         } } }'
  • 14.
    multenancy$ curl -XPUThttp://localhost:9200/kimchy$ curl -XPUT http://localhost:9200/elasticsearch$ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d \'{ "post_date": "2009-11-15T14:12:12", "message": "Zug Zug", "tag": "warcraft" }'$ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d \'{ "post_date": "2009-11-15T14:12:12", "message": "Whatyouwant?", "tag": "warcraft" }'$ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft$ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft
  • 15.
    Setting$ curl -XPUThttp://localhost:9200/kimchy/ -d \'index :    store:        type: memory'$ curl -XPUT http://localhost:9200/elasticsearch/ -d \' {    "index" : {        "number_of_shards" : 2,        "number_of_replicas" : 3    }}'
  • 16.
  • 17.
  • 18.
    Zen DiscoveryZen isused for both discovery and master election. A master in elasticsearch is responsible for handling nodes coming and going and allocation of shards. Note, the master is not a single point of failure, if it fails, then another node will be elected as master. that nodes do not need to communicate with the master on each request, so its not a single point of bottleneckThe readiness of nodes is done using the shard allocation algorithm. A shard allocated to a node is considered “ready” to receive requests only once it has fully initialized.
  • 19.
    scalability nodes that canhold data, and nodes that do not. There is no need for a load balancer in elasticsearch, each node can receive a request, and if it can’t handle it, it will automatically delegate it to the appropriate node(s). If you want to scale out search, you can simply have more shard replicas per shard.
  • 20.
  • 21.
    BASE supportEach documentyou index is there once the index operation is done. No need to commit or something similar to get everything persisted. A shard can have 1 or more replicas for HA. Gateway persistency is done in the background in an async manner.
  • 22.
    The RiverA riveris a pluggable service running within elasticsearch cluster pulling data (or being pushed with data) that is then indexed into the cluster.
  • 23.
    Geo Location andSearch1. make your data geo enabled{    "pin" : {        "location" : {            "lat" : 40.12,            "lon" : -71.34        },        "tag" : ["food", "family"],        "text" : "my favorite family restaurant"    }}Find By LocationSortingFaceting … …
  • 24.
    More details inhttp://www.elasticsearch.com/docs/
  • 25.
  • 26.
    Compare with solrThoughsupport dynamic schema,but it sucks *i ,name_i,age_i,…. Distribute ,just do many replica,Master-Slave,and with a dirty query like this:http://localhost:9080/solr/select/?q=xxx:xxx&shards=localhost:8080/solr,localhost:9080/solr WTF!Does it really RESTful?anyway, doesn’t matter
  • 27.
    Compare with kattaFeatruresMakesserving large or high load indices easyServes very large Lucene or HadoopMapfile indices as index shards on many serversReplicate shards on different servers for performance and fault-toleranceSupports pluggable network topologiesMaster fail-overFast, lightweight, easy to integratePlays well with Hadoop clustersMay heavy to us(may be not)Master-Node,complex and ops will killed us?can’t be a little easy?Lack of Client and documentsInactivity CommunityLake of Some Search Features
  • 28.
  • 29.