Your SlideShare is downloading. ×
Elasticsearch Introduction at BigData meetup
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Elasticsearch Introduction at BigData meetup

24,509
views

Published on

Global introduction to elastisearch presented at BigData meetup. …

Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...

Published in: Technology

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
24,509
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
80
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to Elasticsearch 27th May 2014 - BigData Meetup Eric Rodriguez @wavyx
  • 2. About Me Eric Rodriguez Founder of data.be ! • Web entrepreneur • Data addict • Multi-Language: PHP, Java/ Groovy/Grails, .Net, … be.linkedin.com/in/erodriguez ! github.com/wavyx ! @wavyx
  • 3. Elasticsearch - Company • Founded in 2012 => http://www.elasticsearch.com • Professional services • Training • Consultancy / Development support • Production support subscription (3 levels of SLAs)
  • 4. Enterprises using Elasticsearch
  • 5. (M)ELK Stack • Elasticsearch - Search server based on Lucene • Logstash -Tool for managing events and logs • Kibana -Visualize logs and time-stamped data • Marvel - Monitor your cluster’s heartbeat You Know, for Search…
  • 6. Logstash • Collect, parse, index, and search logs
  • 7. Kibana • A versatile dashboard to see and interact with your data
  • 8. Marvel • Monitor the health of your cluster
 cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)
  • 9. real time, search and analytics engine open-source Lucene JSON schema free document
 store RESTful API documentation scalability high availability distributed multi tenancy per-operation
 persistence
  • 10. Use Cases • Full-Text Search • Data Store • Analytics • Alerts • Ads • …
  • 11. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 12. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 13. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 14. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 15. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 16. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 17. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 18. Elasticsearch core • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java • Elasticsearch added value: “Simple is best” • Simple API (with documentation) • JSON & RESTful • Sharding & Replication • Extensibility: plugins and scripts • Interoperability: clients and integrations
  • 19. Terms for DBAs • Index • Type • Document • Fields • Mapping ElasticsearchRDBMs • Database • Table • Row • Column • Schema
  • 20. Plug & Play • Zero configuration • 4 LoC to get started ;)
  • 21. Alive ! => http://localhost:9200/?pretty
  • 22. REST • Check your cluster, node, and index health, status, and statistics • Administer your cluster, node, and index data and metadata • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes • Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others
  • 23. Basic Operations 1/3 • Add a document • Create index
  • 24. Basic Operations 2/3 • Modify/Replace a document • Delete a document • Delete index
  • 25. Basic Operations 3/3 • Update a document
  • 26. Mapping 1/2 • Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, .. • Explicit mapping is defined on an index/type level • A default mapping is automatically created
  • 27. Mapping 2/2 • Core types: string, integer/long, float/double, boolean, and null • Other types:Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment • Example
  • 28. Search API 1/2 • Multi-index, Multi-type • Uri search - Google like
 Operators (AND/OR), fields, sort, paging, wildcards, …
  • 29. Search API 2/2 • Paging & Sort • Fields: selection, scripts • Post filter • Highlighting • Rescoring • Explain • …
  • 30. Query DSL • “SQL” for elasticsearch • Queries should be used • for full text search • where the result depends on a relevance score • Filters should be used • for binary yes/no searches • for queries on exact values
  • 31. Basic Queries
  • 32. Basic Filters
  • 33. Analysis 1/2 • Analysis is extracting “terms” from a given text • Processing natural language to make it computer searchable • Configurable registry of Analyzers that can be used • to break indexed (analyzed) fields when a document is indexed • to process query strings
  • 34. Analysis 2/2 • Analyzers are composed of • a singleTokenizer (may be preceded by one or more CharFilters) • zero or moreTokenFilters • Default Analyzers
 standard, pattern, whitespace, language, snowball
  • 35. Copyright 2014 Elasticsearch Inc / Elasticsearch BV.All rights reserved. Content used with permission from Elasticsearch.
  • 36. Analytics • Aggregation of information: similar to “group by” • Facets • Aggregated data based on a search query • One-dimensional results • Ex:“term facets” return facetcounts for various values for a specific field 
 Think color, tag, category, … • Aggregations (ES 1.0+) • Nested Facets • Basic Stats: mean, min, max, std dev, term counts • SignificantTerms, Percentiles, Cardinality estimations
  • 37. Facets • not yet deprecated, but use aggregations! • Various Facets
 terms, range, histogram, date, statistical, geo distance, …
  • 38. Aggregations • A generic powerful framework that can be divided into 2 main families: • Bucketing
 Each bucket is associated with a key and a document criterion
 The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it. • Metric
 Aggregations that keep track and compute metrics over a set of documents. • Aggregations can be nested !
  • 39. Bucket Aggregators • global • filter • missing • terms • range • date range • ip range • histogram • date histogram • geo distance • geohash grid • nested • reverse nested • top hits (version 1.3)
  • 40. Metrics Aggregators • count • stats • extended stats • cardinality • percentiles • min • max • sum • avg
  • 41. Search for end users • Suggesters - “Did you mean”
 Terms, Phrases, Completion, Context • “More like this”
 Find documents that are "like" provided text by running it against one or more fields
  • 42. Percolator • Classic ES 1. Add & Index documents 2. Search with queries 3. Retrieve matching documents • Percolator 1. Add & Index queries 2. Percolate documents 3. Retrieve matching queries
  • 43. Why Percolate ?! • Alerts: social media mentions, weather forecast, news alerts • Automatic Monitoring: price monitoring, stock alerts, logs • Ads: display targeted ads based on user’s search queries • Enrich: percolate new documents, then add query matches as document tags
  • 44. High Availability 1/2 • Sharding - Write Scalability • Split logical data over multiple machines & Control data flows • Each index has a fixed number of shards • Improve indexing performance • Replication - Read Scalability • Each shard can have 0-many replicas (dynamic setup) • Removing SPOF (Single Point Of Failure) • Improve search performance
  • 45. High Availability 2/2 • Zen Discovery • Automatic discovery of nodes within a cluster and electing a master node • Useful for failover and replication • Specific modules:Amazon EC2, Microsoft Azure, Google Compute Engine • Snapshot & Restore module
  • 46. Cluster Management • Marvel - http://www.elasticsearch.org/overview/marvel/ • BigDesk - http://bigdesk.org/ • Paramedic - https://github.com/karmi/elasticsearch- paramedic • KOPF - https://github.com/lmenezes/elasticsearch-kopf/ • Elastic HQ - http://www.elastichq.org/
  • 47. Clients & Integration • Ecosystem: Kibana, Logstash, Marvel, Hadoop integration • API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, … • Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal,Wordpress, … • Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ,Amazon SQS, File System,Twitter,Wikipedia, RSS, …
  • 48. Fast & Furious Evolution Version 1.1
 March 25, 2014 • Cardinality Agg • Percentiles Agg • SignificantTerms Agg • SearchTemplates • Cross fields search • Alias for indices & templates Version 1.2
 May 22, 2014 • Java 7 • Indexing & Merging performance • Aggregations performance • Context suggester • Deep scrolling • Field value factor Benchmark API coming in 1.3 Version 1.0
 Feb 12, 2014 • Aggregations • Snapshot & Restore • Distributed Percolator • Cat API • Federated search • Doc values • Circuit breaker
  • 49. Resources • http://www.elasticsearch.org/guide/ • http://www.elasticsearch.org/videos/ • http://www.elasticsearchtutorial.com/ • http://exploringelasticsearch.com/ • http://joelabrahamsson.com/elasticsearch-101/ • http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/ • http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules- plugins.html
  • 50. Books • Elasticsearch Server
 http://www.packtpub.com/ elasticsearch-server-2e/book • Elasticsearch in Action
 http://www.manning.com/ hinman/
  • 51. Books • Elasticsearch Cookbook
 http://www.packtpub.com/ elasticsearch-cookbook/book • Mastering Elasticsearch
 http://www.packtpub.com/ mastering-elasticsearch- querying-and-data-handling/ book
  • 52. Books • Elasticsearch -The Definitive Guide
 http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/
  • 53. Thank you! eric@data.be - @wavyx be.linkedin.com/in/erodriguez - github.com/wavyx http://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/