ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful ...
ElasticSearch is the new kid on the search block. Built on top of Lucene and adhering to the best concepts of so-called NoSQL movement, ElasticSearch is a distributed, highly available, fast RESTful search engine, ready to be plugged into Web applications.
shardsa portion of the document spaceeach one is a separate Lucene index thus, many per-index settings are availabledocument is sharded by its _id value but can be assigned (routed) to a shard deterministically
zero-conf discoveryzen (multicast and unicast)cloud (EC2 via API)
auto-routingmaster node: maintains cluster state reassigns shards if nodes leave/join clusterany node can serve as the request routerthe query is handled via scatter-gather mechanism
replicaseach shard can have 1 or more replicas# of replicas can be updated dynamically afterindex creationreplicas can be used for querying in parallel
transactional modelper-document consistencyno need to commit/flushuses write-behind transaction logwrite consistency (W) can be controlled one, quorum, or all
(near) real-time search1 second refresh rate by default_refresh API also
index storagenode data considered transientcan be stored in local file system, JVM heap,native OS memory, or FS & memory combinationpersistent storage requires a gateway
gatewayspersistent store for cluster state and indicesasynchronous, translog-based write strategyallows full recovery if a cluster restart is neededsupported gateways: local shared FS Hadoop via HDFS S3
mappingdescribes document structure to the searchengineautomatically created with sensible defaultsexplicit mapping can be provided (generally, agood idea)can run into merge conflicts
mappingimportant meta fields: _source _all _boost
mapping typessimple: string, integer/long, float/double, boolean, and null)complex: array, object
analyzersbreak down (tokenize) and normalize fields duringindexing and query strings at search timeanalyzer = tokenizer + token filters (0 or more)*-27<2#<%A72$IZ9#%S%%%*-27<2#<%+1:97BZ9#%]%%%%%%%*-27<2#<%+1:97%^B$-9#%]%%%%%%%_1K9#!239%+1:97%^B$-9#%]%%%%%%%*-1.%+1:97%^B$-9#
analyzers analyzers, tokenizers, and filters can be customizedmapping elasticsearch.yml B7<9P/ %%272$I3B3/ %%%%272$IZ9#/ %%%%%%.@&%,F/ %%%%%%%%-I.9/%!"3-1@ %%%%%%%%-1:97BZ9#/%3-27<2#< %%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E %%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J ` ?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE `
API
API conventionsappend ?pretty=true to get readable JSONboolean values: false/0/off = false, rest is trueJSONP support via callback parameter
API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/_status GET http://es:9200/twitter/_status POST http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/1
API structurehttp://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/user/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
_cluster API structureGET /_cluster/healthGET /_cluster/health/index1,index2GET /_cluster/nodes/statsGET /_cluster/nodes/nodeId1,nodeId2/stats
API {core}index searchbulk querydelete from/size pagingdelete by query sortget highlightingcount selective fields
API {indices}create optimizedelete snapshotopen/close update settingsget/put/delete analyzemapping statusrefresh flush
API {cluster}healthstatenodes infonodes statsnodes shutdown
filtersshare some similar features with queries (term,range, etc)why use a filter?
filtersfaster than queriescached (depends on the filter) the cache is used for different queries against the same filterno scoringmore useful ones: term, terms, range, prefix, and,or, not, exists, missing, query
facetsprovide aggregated data based on the searchrequestterms, histogram, date histogram, range,statistical, and more
geo searchimplemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon
interfacesREST including memcachedJava /!GroovyLanguage clients (REST/Thrift): pyes, PHP (standalone and symfony), Ruby, PerlFlume sink implementation
elasticasimilar to the other PHP ElasticSearch clientAPI naming is consistent with Zend Frameworkcan be extended for new filters, facets, etcstill under development
elastica $es = new Elastica_Client(vm, 9200); $index = new Elastica_Index($es, test); $index->create(array(), true); $type = new Elastica_Type($index, person); $doc = new Elastica_Document(1, array(name => Andrei Zmievski,example email => andrei@test.com, username => andrei, bills => array(2, 3, 5))); $type->addDocument($doc); $qs = new Elastica_Query_QueryString(andrei); $query = new Elastica_Query($qs); $resultSet = $type->search($query); print $resultSet->count();
data importES is not the primary data store (usually)to import/synchronize data: write an agent (Gearman, message queues, etc) use rivers (CouchDB, RabbitMQ, Twitter)
1–2 of 2 previous next