Elastic Search

6,266 views

Published on

My talk about Elastic Search, February 18th, RivieraJUG 2011

Elastic Search

  1. 1. You know, for search February 18th, 2011, RivieraJUG
  2. 2. About me● Lukáš Vlček ( @lukasvlcek )● Java developer since 2001● Joined Red Hat (JBoss division) in 2010● Member of JBoss.org team, focusing on search
  3. 3. In the beginning there was...
  4. 4. Zzzzz......Shay Banon is not sleeping but dreaming!
  5. 5. Highly-available {JSON} RESTfulSearch Engine Distributed CLOUD Asynchronous Open Sourced ... buzzword dreaming! (ASL2)
  6. 6. ! * :-)* ... and @ElasticSearch was born!
  7. 7. Dreams do come truehttps://github.com/elasticsearch/elasticsearch http://www.elasticsearch.org
  8. 8. What is ElasticSearch ?● Distributed● Highly-available● RESTful search engine (on top of Lucene)● Designed to speak JSON (JSON in, JSON out)● and more... ... but first, lets check simple examples.
  9. 9. Demo #1RESTful JSON teaser
  10. 10. RESTful● Network interface for data indexing, searching and administration.curl ­XGET http://localhost:9200/index1,index2/typeA,typeB/_search ­d {  “query“ : { “match_all“ : {} }}You can query one or more indices.Indices can have aliases, you can alsouse _all for all indices.Each index have one or more types, somethinglike columns in DB table.
  11. 11. Highly available● For each index you can specify: ● Number of shards – Each index has fixed number of shards ● Number of replicas – Each shard can have 0-many replicas, can be changed dynamically
  12. 12. Distributed● Check next slides...
  13. 13. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 0 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 B2 B2 B2 B2 A3 A3 A3 Gateway (longterm persistency of cluster data & metadata)
  14. 14. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 4 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 C1 C1 C1 C1 B2 B2 B2 B2 A3 A3 A3 Can not allocate all replicas! Check Health API Gateway (longterm persistency of cluster data & metadata)
  15. 15. Talking to the cluster● Native client in Java and Groovy Client Node 1 Node 2 Node 3● Client type: ● Node client ● Transport Client
  16. 16. Talking to the cluster● REST client Client Node 1 Node 2 Node 3● Many clients built on top of REST API ● Perl, PHP, Python, Ruby, Erlang, ... etc
  17. 17. Nodes do not have to be equal● Can be a master● Can be a data node● Can allow for REST transport interface ● Http, memcached, thrift● Index store (file, memory) Node 1● JMX enabled● Thread pool type● ... Node 2 Node 3
  18. 18. Gateway● Long time persistency allows for whole (and partial) cluster backup and recovery. Types: ● Local (default) ● NFS ● HDFS ● AWS: S3
  19. 19. Distributed queries● You can control the type of the search query per search request: ● Query and Fetch ● Query then Fetch ● Dfs, Query and Fetch ● Dfs, Query then Fetch
  20. 20. Demo #2 Dynamic allocation of indices,shards, replicas and Health API
  21. 21. Admin API● Indices – Status – CRUD operation – Mapping, Open/Close, Update settings – Flush, Refresh, Snapshot, Optimize● Cluster – Health – State – Node Info and stats – Shutdown
  22. 22. Demo #3Admin API: getting JVM and OS stats
  23. 23. Rich query API● There is rich Query DSL for search, includes: ● Queries – Boolean, Fuzzy, MLT, Prefix, DisMax, ... ● Filters – And/Or/Not, Boolean, Geo, Missing, Exists, ... ● Highlighting ● Sort ● Facets – on a next slide...
  24. 24. Facets● Facets allows to provide aggregated data on the search request. ● query ● filter ● terms ● range ● (date) histogram ● statistical ● geo distance
  25. 25. Scripting support● There is a support for using scripting languages in many places (for example for custom scoring, script fields, script key in facets ...) ● mvel (default) ● JS ● Groovy ● Python
  26. 26. Demo #4 Java API: Indexing dataREST API: Faceted search, Highlighting
  27. 27. Parent / Child● The parent/child support allows to define a parent relationship from a child to a parent type. ● has_child (query, filter) ● top_children (filter)
  28. 28. River● Lets listen on stream of changes and index the data... ● CouchDB ● RabbitMQ ● Twitter ● Wikipedia
  29. 29. Versioning (new in 0.15)● “update if current” functionality● ie: I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)● Completely real-time
  30. 30. Percolator (new in 0.15)● The percolator API allows to register queries against an index, and then send a percolate request which includes a document, and getting back the queries that match on that document out of set of registered queries.
  31. 31. Q&A
  32. 32. Thank you!

×