Your SlideShare is downloading. ×

Elastic Search

5,443
views

Published on

My talk about Elastic Search, February 18th, RivieraJUG 2011

My talk about Elastic Search, February 18th, RivieraJUG 2011


1 Comment
21 Likes
Statistics
Notes
  • funny and cool ~~
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,443
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
168
Comments
1
Likes
21
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. You know, for search February 18th, 2011, RivieraJUG
  • 2. About me● Lukáš Vlček ( @lukasvlcek )● Java developer since 2001● Joined Red Hat (JBoss division) in 2010● Member of JBoss.org team, focusing on search
  • 3. In the beginning there was...
  • 4. Zzzzz......Shay Banon is not sleeping but dreaming!
  • 5. Highly-available {JSON} RESTfulSearch Engine Distributed CLOUD Asynchronous Open Sourced ... buzzword dreaming! (ASL2)
  • 6. ! * :-)* ... and @ElasticSearch was born!
  • 7. Dreams do come truehttps://github.com/elasticsearch/elasticsearch http://www.elasticsearch.org
  • 8. What is ElasticSearch ?● Distributed● Highly-available● RESTful search engine (on top of Lucene)● Designed to speak JSON (JSON in, JSON out)● and more... ... but first, lets check simple examples.
  • 9. Demo #1RESTful JSON teaser
  • 10. RESTful● Network interface for data indexing, searching and administration.curl ­XGET http://localhost:9200/index1,index2/typeA,typeB/_search ­d {  “query“ : { “match_all“ : {} }}You can query one or more indices.Indices can have aliases, you can alsouse _all for all indices.Each index have one or more types, somethinglike columns in DB table.
  • 11. Highly available● For each index you can specify: ● Number of shards – Each index has fixed number of shards ● Number of replicas – Each shard can have 0-many replicas, can be changed dynamically
  • 12. Distributed● Check next slides...
  • 13. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 0 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 B2 B2 B2 B2 A3 A3 A3 Gateway (longterm persistency of cluster data & metadata)
  • 14. ZEN Discovery Node 1 Node 2 Node 3 Node 4 A B CA: { shards: 3, replicas: 2 } B: { shards: 2, replicas: 3 } C: { shards: 1, replicas: 4 } A1 A1 A1 B1 B1 B1 B1 A2 A2 A2 C1 C1 C1 C1 C1 B2 B2 B2 B2 A3 A3 A3 Can not allocate all replicas! Check Health API Gateway (longterm persistency of cluster data & metadata)
  • 15. Talking to the cluster● Native client in Java and Groovy Client Node 1 Node 2 Node 3● Client type: ● Node client ● Transport Client
  • 16. Talking to the cluster● REST client Client Node 1 Node 2 Node 3● Many clients built on top of REST API ● Perl, PHP, Python, Ruby, Erlang, ... etc
  • 17. Nodes do not have to be equal● Can be a master● Can be a data node● Can allow for REST transport interface ● Http, memcached, thrift● Index store (file, memory) Node 1● JMX enabled● Thread pool type● ... Node 2 Node 3
  • 18. Gateway● Long time persistency allows for whole (and partial) cluster backup and recovery. Types: ● Local (default) ● NFS ● HDFS ● AWS: S3
  • 19. Distributed queries● You can control the type of the search query per search request: ● Query and Fetch ● Query then Fetch ● Dfs, Query and Fetch ● Dfs, Query then Fetch
  • 20. Demo #2 Dynamic allocation of indices,shards, replicas and Health API
  • 21. Admin API● Indices – Status – CRUD operation – Mapping, Open/Close, Update settings – Flush, Refresh, Snapshot, Optimize● Cluster – Health – State – Node Info and stats – Shutdown
  • 22. Demo #3Admin API: getting JVM and OS stats
  • 23. Rich query API● There is rich Query DSL for search, includes: ● Queries – Boolean, Fuzzy, MLT, Prefix, DisMax, ... ● Filters – And/Or/Not, Boolean, Geo, Missing, Exists, ... ● Highlighting ● Sort ● Facets – on a next slide...
  • 24. Facets● Facets allows to provide aggregated data on the search request. ● query ● filter ● terms ● range ● (date) histogram ● statistical ● geo distance
  • 25. Scripting support● There is a support for using scripting languages in many places (for example for custom scoring, script fields, script key in facets ...) ● mvel (default) ● JS ● Groovy ● Python
  • 26. Demo #4 Java API: Indexing dataREST API: Faceted search, Highlighting
  • 27. Parent / Child● The parent/child support allows to define a parent relationship from a child to a parent type. ● has_child (query, filter) ● top_children (filter)
  • 28. River● Lets listen on stream of changes and index the data... ● CouchDB ● RabbitMQ ● Twitter ● Wikipedia
  • 29. Versioning (new in 0.15)● “update if current” functionality● ie: I can get a document, change it and then put it back in (referencing the version ID I fetched) and it will either index or fail (if the document has been modified in the interim)● Completely real-time
  • 30. Percolator (new in 0.15)● The percolator API allows to register queries against an index, and then send a percolate request which includes a document, and getting back the queries that match on that document out of set of registered queries.
  • 31. Q&A
  • 32. Thank you!