Elasticsearch Introduction to Data model, Search & Aggregations

1. Elasticsearch Zalando Elasticsearch By Alaa Elhadba

2. Table of Contents

3. Why Elasticsearch

4. Why Elasticsearch ✓ ✓ ✓ ✓

5. Elasticsearch at scale

6. Index / Type - An index is a collection of documents that should be grouped together for a common reason. - A type is a collection of documents all share an identical (or very similar) schema

7. Sharding

8. Talking to data

9. Distribution Elasticsearch node Cluster_state: yellow

10. Scaling Cluster Cluster_state: yellow

11. Replication Cluster Cluster_state: Green

14. Replication Cluster Cluster_state: Red

15. Data Modeling

16. Schema Type: ◆ Index: ◆ ◆ ◆ Doc_values: ◆

17. Relationships ● Application Side Joins ● Parent-Child ● Nested objects

18. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● ● ● ●

19. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● Parent-child queries can be 5 to 10 times slower than the equivalent nested query! ● ● ●

20. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● ● ● ● ● ●

21. Searching

22. Searching A filter asks a yes|no question of every document and is used for fields that contain exact values - Is a date within the range 2012 to 2015 ? - Is the status “Approved” ? - Is the language code “DE” ? STRUCTURED SEARCH A query calculates how relevant each document is to the query, and assigns it a relevance , which is later used to sort matching documents by relevance. - Containing the word run, but maybe also matching runs, running, jog, or sprint UNSTRUCTURED SEARCH

23. Searching A filter asks a yes|no question of every document and is used for fields that contain exact values - Is a date within the range 2012 to 2015 ? - Is the status “Approved” ? - Is the language code “DE” ? STRUCTURED SEARCH A query calculates how relevant each document is to the query, and assigns it a relevance , which is later used to sort matching documents by relevance. - Containing the word run, but maybe also matching runs, running, jog, or sprint UNSTRUCTURED SEARCH

24. Terms Query Example

25. Unstructured Search (Full Text) Quick brown foxes leap over lazy dogs in summer Quick, brown, foxes, leap, over, lazy, dogs, in, summer Quick, brown, foxes, leap, lazy, dogs, summer Quick, brown, fox, leap, lazy, dog, summer fast, brown, fox, jump, lazy, dog, summer tsar -> star Inverted Index

26. Relevance

27. Scoring & Relevance in Full-Text Search Relevance is the algorithm to calculate how similar the contents of a field to a query. TF/IDF Term Frequency How often does the term appear in the field? Inverse Document Frequency How often does each term appear in the index? Field Length Norm How long is the field?

28. Vector Space Model The vector space model provides a way of comparing a multiterm query against a document. - The model represents both the document and the query as vectors.

29. Vector Space Model 1. I am happy in summer. 2. After Christmas I’m a hippopotamus. 3. The happy hippopotamus helped Harry. - By measuring the angle between the query vector and the document vector, it is possible to assign a relevance score to each document. - If The angle between a document and the query is large, so it is of low relevance.

30. Constant Score

31. Field Value Factor

32. Field Value Factor

33. Script Scoring

34. Aggregations

35. Aggregation Search Analytics Business Requirement “Help me find the best documents ?” “What do theses documents tell me about my business ?” Enablers Matching, Relevance, Filtering, Auto-completion,... Summaries, Patterns, Trends, Outliers, Predictions, Visualization - Aggregations help build complex summaries & analytics of the indexed data.

36. Aggregation Terms Significant Terms

37. Bucket Aggregations

38. Nested Aggregations

39. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations

46. Significant Terms

50. What’s uncommonly common about this sub-group ?

51. Significant Terms - Significant_terms analyzes your data and finds terms that appear with a frequency that is statistically anomalous compared to the background data. - It can uncover surprisingly sophisticated trends and correlation in your data. - Used in discovering anomalies

52. Significant Terms Summarisehow their style differ from everyone else Find all people who like these products

53. Significant Terms

54. Kibana: Data Visualization

55. Kibana

56. Contact

Elasticsearch Introduction to Data model, Search & Aggregations

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (8)

Similar to Elasticsearch Introduction to Data model, Search & Aggregations

Similar to Elasticsearch Introduction to Data model, Search & Aggregations (20)

Recently uploaded

Recently uploaded (20)

Elasticsearch Introduction to Data model, Search & Aggregations