• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Wanna search? Piece of cake!
 

Wanna search? Piece of cake!

on

  • 1,067 views

Fast, scalable and easy to setup search engine for your

Fast, scalable and easy to setup search engine for your
data.

Statistics

Views

Total Views
1,067
Views on SlideShare
1,067
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Wanna search? Piece of cake! Wanna search? Piece of cake! Presentation Transcript

    • Wanna search? Piece of cake! Fast, scalable and easy to setup search engine for your data. by Alexey Kursov http://www.linkedin.com/in/kursov
    • ElasticSearch is a ● distributed ● RESTful ● free/open source search server ● based on Apache Lucene. It is developed by Shay Banon(@kimchy) and is released under the terms of the Apache License. ElasticSearch is developed in Java. http://elasticsearch.org/ http://elasticsearch.com/ WTF?
    • Apache Lucene is a ● free/open source information retrieval software library ● originally created in Java ● it is supported by the Apache Software Foundation ● it is released under the Apache Software License While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching. http://lucene.apache.org/core/ Lucene?
    • Indexing. ElasticSearch is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages). ElasticSearch uses Apache Lucene to create and manage this inverted index. Basic Concepts
    • In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. Simple example: Given the texts: T[0] = "it is what it is" T[1] = "what is it" T[2] = "it is a banana" we have the following inverted file index (where the integers in the set notation brackets refer to the indexes (or keys) of the text symbols, T[0], T[1] etc.): "a": {2} "banana": {2} "is": {0, 1, 2} "it": {0, 1, 2} "what": {0, 1} Inverted index
    • Basic Concepts Data representation. In ElasticSearch, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields (in database terminology, a Document corresponds to a table row, and a Field corresponds to a table column). Schema declares: - what fields there are - which field should be used as the unique/primary key - which fields are required - how to index and search each field - etc. An index may store documents of different "mapping types". You can associate multiple mapping definitions for each mapping type. A mapping type is a way of separating the documents in an index into logical groups.
    • Competitors? http://lucene.apache.org/solr/ http://sphinxsearch.com/
    • What's the same? VS Lucene Query, Facet, Index functionality implementation: Very similar, but have some differences and nuances, as the one or the other side (in the internet a lot of information about this, you can read for example this series of articles http://blog.sematext.com/2012/08/23/solr-vs- elasticsearch-part-1-overview/ )
    • What's the difference? VS ElasticSearch main advantages (IMHO): 1. Low barriers to entry. ElasticSearch is a more "intuitive, accessible" system (significantly less configuration, as it's dynamic via HTTP schema builder and sensible defaults) 2. JSON-based API is cleaner and easier to use 3. The replication and sharding capabilities are much simpler to configure 4. Complex documents (nested) 5. Multiple document types per schema 6. Joins (parent/child relationships) 7. Online schema changes 8. Self-contained cluster
    • What's the difference? VS Solr main advantages (IMHO): 1. Solr has a bigger, more mature user, dev, and contributor community 2. Solr is more mature and maybe more stable 3. Solr has more response formats (XML,CSV,JSON) 4. Better 3rd-party product integration 5. Pivot Facets 6. More customizable
    • Who wins? VS We are all!
    • ES Clients and "river" plugins There are clients for languages and platforms (from official site): Java, .Net, Perl, Python, Python, Ruby, PHP, Javascript, Scala, Clojure, Go, Erlang, EventMachine, OCaml, Smalltalk There are "river" (data import) plugins for: JDBC, CouchDB, Wikipedia, Twitter, RabbitMQ, RSS, MongoDB, Open Archives Initiative (OAI) , St9, Sofa, Amazon SQS, LDAP, Dropbox, ActiveMQ, Solr, CSV, JMS
    • Who use ?
    • How to connect from my code? NEST (Guys from stackowerflow.com and I think it is the best .net client for ElasticSearch) NEST aims to be a .net client with a very concise API. (http://github.com/Mpdreamz/NEST) Its main goal is to provide a solid strongly typed Elasticsearch client. It also has string/dynamic overloads for more dynamic use cases. Why NEST? ● Fluent. Looks like: ElasticClient.Search<Foo>(s => s.From(0).Size(10).SortAscending(f => f.Name).Query(... ● Json serializer/deserializer - Newtonsoft Json.NET with all its advantages ● Strongly typed ● Useful attributes for configuring ● kept improving and developing ● Open-source ● Clear and beauty source code ● Available on NuGet
    • Practice