Your SlideShare is downloading. ×
0
Satish Mohan
Your Data,
Your Search
Tuesday, 12 March 13
Enterprises today are collecting and have access
to more data points in their ecosystem then ever.
Tuesday, 12 March 13
File Store Example
File / Folder Navigation
Integration - Mount Points
Limited Metadata
Hierarchical Structure
Regular Fil...
• Find a document from December 2011 about transfer containing proposal and David
• Find the document received from John c...
• Find a document from December 2011 about transfer containing proposal and David
• Find the document received from John c...
ElasticSearch is an open source, scalable,
distributed, cloud-ready, highly-available full-text
search engine and database...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Ca...
Playing with ElasticSearch
Rivers
• Data flows from sources using Rivers
• Continues to add data as it flows
• Can be added,...
Playing with ElasticSearch
Rivers
• Data flows from sources using Rivers
• Continues to add data as it flows
• Can be added,...
Playing with ElasticSearch
Rivers
• Data flows from sources using Rivers
• Continues to add data as it flows
• Can be added,...
Playing with ElasticSearch
River Modules
• CouchDB • JDBC
• MongoDB • Solr
• Wikipedia • Jira
• Twitter • CSV
• ActiveMQ •...
Playing with ElasticSearch
Index
• Describes document structure to the search engine
• Automatically created with sensible...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Sh...
Playing with ElasticSearch
Distributed Model
• Number of shards is the scaling unit [ #shards > #nodes ]
• each one is a s...
Playing with ElasticSearch
Index Aliases
curl -X POST 'http://localhost:9200/_aliases' -d '{
"actions" : [
{
"add" : {
“in...
Playing with ElasticSearch
Index Aliases
curl -X POST 'http://localhost:9200/_aliases' -d
' {
"actions" : [ { "add" : {
"i...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Sh...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Playing with ElasticSearch
T...
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Playing with ElasticSearch
S...
Playing with ElasticSearch
REST API : http://host:port/[index]/[type]/_action/id]
HTTP Methods : GET, POST, PUT, DELETE
cu...
Playing with ElasticSearch
REST API : http://host:port/[index]/[type]/_action/id]
HTTP Methods : GET, POST, PUT, DELETE
cu...
Playing with ElasticSearch
REST API : http://host:port/[index]/[type]/_action/id]
HTTP Methods : GET, POST, PUT, DELETE
re...
Playing with ElasticSearch
REST API : http://host:port/[index]/[type]/_action/id]
HTTP Methods : GET, POST, PUT, DELETE
{
...
Playing with ElasticSearch
REST API : http://host:port/[index]/[type]/_action/id]
HTTP Methods : GET, POST, PUT, DELETE
{
...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Micro Applications
Rich, interactive single-page web applications powered by JavaScript, HTML and
CSS.
Tuesday, 12 March 13
Micro Applications
Tuesday, 12 March 13
Micro Applications
Tuesday, 12 March 13
Micro Applications
Tuesday, 12 March 13
Micro Applications
Tuesday, 12 March 13
Micro Applications
Rich, interactive single-page web applications powered by JavaScript, HTML
and CSS.
• A self-described ...
Micro Applications
Rich, interactive single-page web applications powered by JavaScript, HTML
and CSS.
• A self-described ...
Playing with ElasticSearch
More Features.....
• document oriented • load balancing
• versioning • plugins
• parent/child d...
Structured Data
Unstructured Data Data Refinery
Message Queues
Inverted index
Transaction Log Versioning
Source Document
Da...
Search is the primary interface for getting
information today. Let’s build on it.
Search
DiscoverAnalyse
Tuesday, 12 March...
Tuesday, 12 March 13
Data Management Tools - Challenges
• Interactive queries, data exploration or iterative query refinement poses
significant c...
Upcoming SlideShare
Loading in...5
×

elasticsearch

2,088

Published on

Search-oriented architecture using elasticsearch

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,088
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
70
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "elasticsearch"

  1. 1. Satish Mohan Your Data, Your Search Tuesday, 12 March 13
  2. 2. Enterprises today are collecting and have access to more data points in their ecosystem then ever. Tuesday, 12 March 13
  3. 3. File Store Example File / Folder Navigation Integration - Mount Points Limited Metadata Hierarchical Structure Regular File Store Tuesday, 12 March 13
  4. 4. • Find a document from December 2011 about transfer containing proposal and David • Find the document received from John containing David and transfer • Find the revisions of transfer document File Store Example File / Folder Navigation Integration - Mount Points Limited Metadata Hierarchical Structure Tuesday, 12 March 13
  5. 5. • Find a document from December 2011 about transfer containing proposal and David • Find the document received from John containing David and transfer • Find the revisions of transfer document File Store Example File / Folder Navigation Integration - Mount Points Limited Metadata Hierarchical Structure Collections / Documents Local / Distributed Integrations Semantic Metadata Declarative Queries Automatic Indexing Provenance Automatic Organization Virtual Collections Regular File Store Intelligent File Store Tuesday, 12 March 13
  6. 6. ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene. Tuesday, 12 March 13
  7. 7. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Capture & Curate Index Streams Analyse Search MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  8. 8. Playing with ElasticSearch Rivers • Data flows from sources using Rivers • Continues to add data as it flows • Can be added, removed, configured dynamically Tuesday, 12 March 13
  9. 9. Playing with ElasticSearch Rivers • Data flows from sources using Rivers • Continues to add data as it flows • Can be added, removed, configured dynamically ES NodeData Source Data Source Data Source River River River ES Index Tuesday, 12 March 13
  10. 10. Playing with ElasticSearch Rivers • Data flows from sources using Rivers • Continues to add data as it flows • Can be added, removed, configured dynamically ES NodeData Source Data Source Data Source River River River ES Index Tuesday, 12 March 13
  11. 11. Playing with ElasticSearch River Modules • CouchDB • JDBC • MongoDB • Solr • Wikipedia • Jira • Twitter • CSV • ActiveMQ • FileSystem • RabbitMQ • SysInfo • NSQ • Logs • RSS • LDAP Tuesday, 12 March 13
  12. 12. Playing with ElasticSearch Index • Describes document structure to the search engine • Automatically created with sensible defaults • Explicit mapping can be provided (generally, a good idea) • Simple: • string, integer/long, float/double, boolean, and null • Complex: • array, object Tuesday, 12 March 13
  13. 13. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Shards Replication Load Balancing Nodes Distributed Capture & Curate Index Streams Analyse Search MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  14. 14. Playing with ElasticSearch Distributed Model • Number of shards is the scaling unit [ #shards > #nodes ] • each one is a separate Lucene index thus, many per-index settings are available • Moving shards around is faster than splitting them (no reindex) • Replicas also serves reads, allowing to scale search • # of replicas can be updated dynamically after index creation Node 1 user (0) user (1) Node 2 user1 (0) user (1) Node 3 user (0) user2 (0) Automatic Discovery Protocol Replica Shard Tuesday, 12 March 13
  15. 15. Playing with ElasticSearch Index Aliases curl -X POST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { “index” : “users”, “alias” : “user_1”, “filter” : { “term” : { “user” : “1” } }, “routing” : “1” } } ] }' Indexing and search happens on the alias, with automatic use of routing and filtering Tuesday, 12 March 13
  16. 16. Playing with ElasticSearch Index Aliases curl -X POST 'http://localhost:9200/_aliases' -d ' { "actions" : [ { "add" : { "index" : "user_1", "alias" : "users" } }, { "add" : { "index" : "user_2", "alias" : "users" } } ] }' users user_1 user_2 curl -X GET "http://localhost:9200/users/_search?q=..." Tuesday, 12 March 13
  17. 17. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Shards Replication Load Balancing Nodes Distributed Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  18. 18. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  19. 19. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Discovery MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  20. 20. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Discovery MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  21. 21. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Monitor Discovery MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  22. 22. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Monitor Discovery RESTful MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  23. 23. REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Playing with ElasticSearch Tuesday, 12 March 13
  24. 24. REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Playing with ElasticSearch Some Definitions..... • index -> Like a database • type -> Like a table • id -> Like a row in a table Tuesday, 12 March 13
  25. 25. Playing with ElasticSearch REST API : http://host:port/[index]/[type]/_action/id] HTTP Methods : GET, POST, PUT, DELETE curl -X POST "http://localhost:9200/articles/article/1" -d ' { "title" : "ElasticSearch Understands JSON!", "body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...", "published_on" : "2013/02/06 10:00:00", "tags" : ["search", "json"], "author" : { "first_name" : "Bruce", "last_name" : "Croft", "email" : "bruce@croft.org" } }' request Tuesday, 12 March 13
  26. 26. Playing with ElasticSearch REST API : http://host:port/[index]/[type]/_action/id] HTTP Methods : GET, POST, PUT, DELETE curl -X POST "http://localhost:9200/articles/article/1" -d ' { "title" : "ElasticSearch Understands JSON!", "body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...", "published_on" : "2013/02/06 10:00:00", "tags" : ["search", "json"], "author" : { "first_name" : "Bruce", "last_name" : "Croft", "email" : "bruce@croft.org" } }' { "ok":true, "_index":"articles", "_type":"article", "_id":"1", "_version":1 } requestresponse Tuesday, 12 March 13
  27. 27. Playing with ElasticSearch REST API : http://host:port/[index]/[type]/_action/id] HTTP Methods : GET, POST, PUT, DELETE request curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE" Tuesday, 12 March 13
  28. 28. Playing with ElasticSearch REST API : http://host:port/[index]/[type]/_action/id] HTTP Methods : GET, POST, PUT, DELETE { "took":1, "timed_out":false, "_shards":{"total":5,"successful":5,"failed":0}, "hits":{ "total":1, "max_score":0.30685282, "hits":[{ "_index":"articles", "_type":"article", "_id":"1", "_score":0.30685282, "_source" : { "title" : "ElasticSearch Understands JSON!", "body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...", "published_on" : "2013/02/06 10:00:00", "tags" : ["search", "json"], "author" : { "first_name" : "Bruce", "last_name" : "Croft", "email" : "bruce@croft.org" } } } ] } } request curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE" response Tuesday, 12 March 13
  29. 29. Playing with ElasticSearch REST API : http://host:port/[index]/[type]/_action/id] HTTP Methods : GET, POST, PUT, DELETE { "took":1, "timed_out":false, "_shards":{"total":5,"successful":5,"failed":0}, "hits":{ "total":1, "max_score":0.30685282, "hits":[{ "_index":"articles", "_type":"article", "_id":"1", "_score":0.30685282, "_source" : { "title" : "ElasticSearch Understands JSON!", "body" : "ElasticSearch not only “works” with JSON, it understands it! Let’s first ...", "published_on" : "2013/02/06 10:00:00", "tags" : ["search", "json"], "author" : { "first_name" : "Bruce", "last_name" : "Croft", "email" : "bruce@croft.org" } } } ] } } request curl -X GET "http://localhost:9200/articles/_search?q=author.first_name:BRUCE" response Location & ID Document Source Total number of documents Tuesday, 12 March 13
  30. 30. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Monitor Discovery RESTful Micro Apps MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  31. 31. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript HTML5/CSS3 Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Monitor Discovery RESTful Micro Apps MemoryShared FS FS + MemoryLocal FS Document Store Playing with ElasticSearch Tuesday, 12 March 13
  32. 32. Micro Applications Rich, interactive single-page web applications powered by JavaScript, HTML and CSS. Tuesday, 12 March 13
  33. 33. Micro Applications Tuesday, 12 March 13
  34. 34. Micro Applications Tuesday, 12 March 13
  35. 35. Micro Applications Tuesday, 12 March 13
  36. 36. Micro Applications Tuesday, 12 March 13
  37. 37. Micro Applications Rich, interactive single-page web applications powered by JavaScript, HTML and CSS. • A self-described framework for ambitious applications • Rails-inspired “convention over configuration” approach • High level abstractions, two-way binding and auto-updating templates Data Model ControllerRouter View Model Model Controller View View View Tuesday, 12 March 13
  38. 38. Micro Applications Rich, interactive single-page web applications powered by JavaScript, HTML and CSS. • A self-described framework for ambitious applications • Rails-inspired “convention over configuration” approach • High level abstractions, two-way binding and auto-updating templates • Ember Data • Client side storage adapter • Provides a common interface to persist application data • RESTful HTTP service - primary endpoint • Browser’s localStorage • Emerging web databases such as IndexedDB Data Model ControllerRouter View Model Model Controller View View View Tuesday, 12 March 13
  39. 39. Playing with ElasticSearch More Features..... • document oriented • load balancing • versioning • plugins • parent/child docs • more_like_this • scripting • multi_field mapping • dynamic mapping templates • percolation • bulk indexing • facets • geo location • index aliases • auto-complete • ngrams & edge-ngrams • histograms • rivers Tuesday, 12 March 13
  40. 40. Structured Data Unstructured Data Data Refinery Message Queues Inverted index Transaction Log Versioning Source Document Data Sources Tokenisers Retrieval Models Structured Results Language Bindings Transport Shards Replication Load Balancing Nodes Distributed Zen EC2 mvel Python Groovy Javascript HTML5/CSS3 Javascript Capture & Curate Index Streams Analyse Search Transport HTTP WebSockets Thrift ZeroMQ memcached TCP Modules Extend Script Monitor Discovery RESTful Micro Apps MemoryShared FS FS + MemoryLocal FS Document Store An alternative that would allow scientists or even casual users to perform analysis of distributed data regardless of where the data resides. Tuesday, 12 March 13
  41. 41. Search is the primary interface for getting information today. Let’s build on it. Search DiscoverAnalyse Tuesday, 12 March 13
  42. 42. Tuesday, 12 March 13
  43. 43. Data Management Tools - Challenges • Interactive queries, data exploration or iterative query refinement poses significant challenges for current methods • Building and running jobs and queries requires deep understanding of cluster size and structure, job performance, etc. • Time-consuming to set up, deploy and use Tuesday, 12 March 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×