Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Developer Manual
Elastic Search
What is Elasticsearch?
- Powerful search server
- Stores data, although not so good at it
- Quick, flexible and scalable
-...
How does it work?
The data is stored in JSON files, which are sent to the
server through an API. These documents are proce...
Documents, Types & Indexes
- Index: It's equivalent to a “database” on MySQL. Each
index contains various…
- Types: Are eq...
Tokens
At Elasticsearch, each ‘searchable’ documents field is
processed according to the rules defined at the Analyzer.
Fr...
Inverted Index
The Tokens become the indexes to a certain document.
Differently from MySQL, that finds a document in O(1) ...
Analyzers
- Analyzer is a group of rules for tokening and information
filtering on Elasticsearch. There’s millions of diff...
Tokenizers
Are responsible for “breaking the
phrases into important shards”.
<<Link>>
Standard Tokenizer
Edge NGram Tokeni...
Filters
They polish the tokens,
making them more precise
without turning them
irrelevant.
<<Link>>
Standard Token Filter
A...
Do I have to know all of this?
No
The Analyzers are configured only once during the index
building.
Creating an Index: MAPPING
- Before creating your index, it’s important to dedicate
some moments to build the mapping.
- T...
Huh? Shards?
Shards are LITERALLY ‘shards’ of your Index, which can be
allocated in different Nodes and have replicas in o...
Nodes?
Machines. Nodes are machines executing the Elasticsearch
server. Nodes can be configured to recognize and interact
...
Aggregations
An aggregation is an extra information about the query’s
result.
For example, an aggregation can bring the av...
How to build an Index
There are many ways. It is possible to build it and then
create the mapping; Creating both together;...
Upcoming SlideShare
Loading in …5
×

eDirectory Elastic Search

290 views

Published on

A helpful development manual for using the new Elasticsearch feature recently added on the edirectory.com, online directory software platform.

Published in: Software
  • Be the first to comment

eDirectory Elastic Search

  1. 1. Developer Manual Elastic Search
  2. 2. What is Elasticsearch? - Powerful search server - Stores data, although not so good at it - Quick, flexible and scalable - Allows complex search queries through easily understandable JSON structures - Accessible and manageable through a RESTful API
  3. 3. How does it work? The data is stored in JSON files, which are sent to the server through an API. These documents are processed during indexing, and an inverted index is created, containing the results(tokens) from the document pre-processing. Then, this document can be searched using the API. The query used on the search will also be processed and reduced to tokens, which will be used to obtain the stored document.
  4. 4. Documents, Types & Indexes - Index: It's equivalent to a “database” on MySQL. Each index contains various… - Types: Are equivalent to a MySQL “table”. Every type has it's own mapping, which can be automatically created when inserting documents (not recommended) or effectively defined. Each type contains various… - Documents: Are equivalent to a MySQL “row”. It's where data is effectively stored. Documents can be deleted, “modified” e inserted through the API.
  5. 5. Tokens At Elasticsearch, each ‘searchable’ documents field is processed according to the rules defined at the Analyzer. From those, Tokens are extracted, which simply are words ‘shards’. These shards are indexes (Like an array's key). These indexes can be used to locate a document.
  6. 6. Inverted Index The Tokens become the indexes to a certain document. Differently from MySQL, that finds a document in O(1) time through one of your columns defined as ‘index’, the Elasticsearch has a different approach: All tokens are indexes which return different documents of different Types.
  7. 7. Analyzers - Analyzer is a group of rules for tokening and information filtering on Elasticsearch. There’s millions of different kinds of tokening and filtering, and infinite ways of combining them. - It’s potentially the difference between a poor query and the perfect search - Each analyzer must be forethought with emphasis on the language and the app specifications.
  8. 8. Tokenizers Are responsible for “breaking the phrases into important shards”. <<Link>> Standard Tokenizer Edge NGram Tokenizer Keyword Tokenizer Letter Tokenizer Lowercase Tokenizer NGram Tokenizer Whitespace Tokenizer Pattern Tokenizer UAX Email URL Tokenizer Path Hierarchy Tokenizer Classic Tokenizer Thai Tokenizer
  9. 9. Filters They polish the tokens, making them more precise without turning them irrelevant. <<Link>> Standard Token Filter ASCII Folding Token Filter Length Token Filter Lowercase Token Filter Uppercase Token Filter NGram Token Filter Edge NGram Token Filter Porter Stem Token Filter Shingle Token Filter Stop Token Filter Word Delimiter Token Filter Stemmer Token Filter Stemmer Override Token Filter Keyword Marker Token Filter Keyword Repeat Token Filter KStem Token Filter Snowball Token Filter Phonetic Token Filter Synonym Token Filter Compound Word Token Filter Reverse Token Filter Elision Token Filter Truncate Token Filter Unique Token Filter Pattern Capture Token Filter Pattern Replace Token Filter Trim Token Filter & more.
  10. 10. Do I have to know all of this? No The Analyzers are configured only once during the index building.
  11. 11. Creating an Index: MAPPING - Before creating your index, it’s important to dedicate some moments to build the mapping. - The mapping defines all the fields of all of your types, just like the analyzer that will be used for indexing andor searching in each field. - Also responsible for defining the number of SHARDs the index will have.
  12. 12. Huh? Shards? Shards are LITERALLY ‘shards’ of your Index, which can be allocated in different Nodes and have replicas in other machines. Shards are important for info safety and are responsible for the distributed nature of Elasticsearch. Each shard is independent from other operations.
  13. 13. Nodes? Machines. Nodes are machines executing the Elasticsearch server. Nodes can be configured to recognize and interact with others. By rule, one of the Nodes becomes the “Master”, while the remaining become “Slaves”. The Master is responsible to coordinate both indexes and search requirements.
  14. 14. Aggregations An aggregation is an extra information about the query’s result. For example, an aggregation can bring the average price of all deals found in the result of a query of Deals filtered by region. It could also collect all categories contained on the result of a query of all Listings.
  15. 15. How to build an Index There are many ways. It is possible to build it and then create the mapping; Creating both together; Creating both, one at the time, and then link together, etcetera. The most recommended, though, is to create them all at once. It is done by sending a request of PUT type to the Elasticsearch server.

×