You know, for search  February 18th, 2011, RivieraJUG
About me●   Lukáš Vlček ( @lukasvlcek )●   Java developer since 2001●   Joined Red Hat (JBoss division) in 2010●   Member ...
In the beginning there was...
Zzzzz......Shay Banon is not sleeping but dreaming!
Highly-available                {JSON}   RESTfulSearch Engine                                     Distributed             ...
!                              *                                 :-)*        ... and @ElasticSearch                     wa...
Dreams do come truehttps://github.com/elasticsearch/elasticsearch        http://www.elasticsearch.org
What is ElasticSearch ?●   Distributed●   Highly-available●   RESTful search engine (on top of Lucene)●   Designed to spea...
Demo #1RESTful JSON teaser
RESTful●   Network interface for data indexing, searching    and administration.curl ­XGET http://localhost:9200/index1,in...
Highly available●   For each index you can specify:    ●   Number of shards        –   Each index has fixed number of shar...
Distributed●   Check next slides...
ZEN Discovery      Node 1                    Node 2              Node 3              Node 4                      A        ...
ZEN Discovery      Node 1                    Node 2              Node 3              Node 4                      A        ...
Talking to the cluster●   Native client in Java and Groovy          Client                    Node 1   Node 2    Node 3●  ...
Talking to the cluster●   REST client             Client                       Node 1      Node 2      Node 3●   Many clie...
Nodes do not have to be equal●   Can be a master●   Can be a data node●   Can allow for REST transport interface    ●     ...
Gateway●   Long time persistency allows for whole (and    partial) cluster backup and recovery.    Types:    ●   Local (de...
Distributed queries●   You can control the type of the search query    per search request:    ●   Query and Fetch    ●   Q...
Demo #2 Dynamic allocation of indices,shards, replicas and Health API
Admin API●   Indices       –   Status       –   CRUD operation       –   Mapping, Open/Close, Update settings       –   Fl...
Demo #3Admin API: getting JVM and OS stats
Rich query API●   There is rich Query DSL for search, includes:    ●   Queries         –   Boolean, Fuzzy, MLT, Prefix, Di...
Facets●   Facets allows to provide aggregated data on    the search request.    ●   query    ●   filter    ●   terms    ● ...
Scripting support●   There is a support for using scripting languages    in many places (for example for custom scoring,  ...
Demo #4      Java API: Indexing dataREST API: Faceted search, Highlighting
Parent / Child●   The parent/child support allows to define a    parent relationship from a child to a parent type.    ●  ...
River●   Lets listen on stream of changes and index the    data...    ●   CouchDB    ●   RabbitMQ    ●   Twitter    ●   Wi...
Versioning (new in 0.15)●   “update if current” functionality●   ie: I can get a document, change it and then put    it ba...
Percolator (new in 0.15)●   The percolator API allows to register queries    against an index, and then send a percolate  ...
Q&A
Thank you!
Elastic Search
Upcoming SlideShare
Loading in...5
×

Elastic Search

5,534

Published on

My talk about Elastic Search, February 18th, RivieraJUG 2011

Elastic Search

  1. 1. You know, for search February 18th, 2011, RivieraJUG
  2. 2. About me● Lukáš Vlček ( @lukasvlcek )● Java developer since 2001● Joined Red Hat (JBoss division) in 2010● Member of JBoss.org team, focusing on search
  3. 3. In the beginning there was...
  4. 4. Zzzzz......Shay Banon is not sleeping but dreaming!
  5. 5. Highly-available {JSON} RESTfulSearch Engine Distributed CLOUD Asynchronous Open Sourced ... buzzword dreaming! (ASL2)
  6. 6. ! * :-)* ... and @ElasticSearch was born!
  7. 7. Dreams do come truehttps://github.com/elasticsearch/elasticsearch http://www.elasticsearch.org
  8. 8. What is ElasticSearch ?● Distributed● Highly-available● RESTful search engine (on top of Lucene)● Designed to speak JSON (JSON in, JSON out)● and more... ... but first, lets check simple examples.
  9. 9. Demo #1RESTful JSON teaser
  10. 10. RESTful● Network interface for data indexing, searching and administration.curl ­XGET http://localhost:9200/index1,index2/typeA,typeB/_search ­d {  “query“ : { “match_all“ : {} }}You can query one or more indices.Indices can have aliases, you can alsouse _all for all indices.Each index have one or more types, somethinglike columns in DB table.
  11. 11. Highly available● For each index you can specify: ● Number of shards – Each index has fixed number of shards ● Number of replicas – Each shard can have 0-many replicas, can be changed dynamically
  12. 12. Distributed● Check next slides...
  13. 13. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 0 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 B2 B2 B2 B2 A3 A3 A3 Gateway (longterm persistency of cluster data & metadata)
  14. 14. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 4 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 C1 C1 C1 C1 B2 B2 B2 B2 A3 A3 A3 Can not allocate all replicas! Check Health API Gateway (longterm persistency of cluster data & metadata)
  15. 15. Talking to the cluster● Native client in Java and Groovy Client Node 1 Node 2 Node 3● Client type: ● Node client ● Transport Client
  16. 16. Talking to the cluster● REST client Client Node 1 Node 2 Node 3● Many clients built on top of REST API ● Perl, PHP, Python, Ruby, Erlang, ... etc
  17. 17. Nodes do not have to be equal● Can be a master● Can be a data node● Can allow for REST transport interface ● Http, memcached, thrift● Index store (file, memory) Node 1● JMX enabled● Thread pool type● ... Node 2 Node 3
  18. 18. Gateway● Long time persistency allows for whole (and partial) cluster backup and recovery. Types: ● Local (default) ● NFS ● HDFS ● AWS: S3
  19. 19. Distributed queries● You can control the type of the search query per search request: ● Query and Fetch ● Query then Fetch ● Dfs, Query and Fetch ● Dfs, Query then Fetch
  20. 20. Demo #2 Dynamic allocation of indices,shards, replicas and Health API
  21. 21. Admin API● Indices – Status – CRUD operation – Mapping, Open/Close, Update settings – Flush, Refresh, Snapshot, Optimize● Cluster – Health – State – Node Info and stats – Shutdown
  22. 22. Demo #3Admin API: getting JVM and OS stats
  23. 23. Rich query API● There is rich Query DSL for search, includes: ● Queries – Boolean, Fuzzy, MLT, Prefix, DisMax, ... ● Filters – And/Or/Not, Boolean, Geo, Missing, Exists, ... ● Highlighting ● Sort ● Facets – on a next slide...
  24. 24. Facets● Facets allows to provide aggregated data on the search request. ● query ● filter ● terms ● range ● (date) histogram ● statistical ● geo distance
  25. 25. Scripting support● There is a support for using scripting languages in many places (for example for custom scoring, script fields, script key in facets ...) ● mvel (default) ● JS ● Groovy ● Python
  26. 26. Demo #4 Java API: Indexing dataREST API: Faceted search, Highlighting
  27. 27. Parent / Child● The parent/child support allows to define a parent relationship from a child to a parent type. ● has_child (query, filter) ● top_children (filter)
  28. 28. River● Lets listen on stream of changes and index the data... ● CouchDB ● RabbitMQ ● Twitter ● Wikipedia
  29. 29. Versioning (new in 0.15)● “update if current” functionality● ie: I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)● Completely real-time
  30. 30. Percolator (new in 0.15)● The percolator API allows to register queries against an index, and then send a percolate request which includes a document, and getting back the queries that match on that document out of set of registered queries.
  31. 31. Q&A
  32. 32. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×