SlideShare a Scribd company logo
Your Data,
Your Search !
问志光
2016-06-27
1
Outline
 Information retrieval
 Indexing & Searching
 Elasticsearch
2
Information retrieval
 Information Retrieval(IR) is finding
material(usually documents) of an unstructured
nature(usually text) that statisfies an information
need from within large collections(usually stored
on computers).
 Search Engine is a software system that is
designed to search for information. It’s a kind of
implementation of IR.
3
What is search engine?
 A search engine is
 An index engine for documents
 A search engine on indexes
 A search engine is more powerful to do
searches:
It’s designed for it !
4
Search Engine Architecture
5
6
7
8
Problems ??
 How to store the data ?
 How to index the data ?
 How to search the data ?
9
How to store the data ?
INVERTED LIST
10
How to
the data ?
INDEX
11
the follow two files
 File1: Students should be allowed to go
out with their friends, but not allowed
to drink beer.
 File2: My friend Jerry went to school to
see his students but found them drunk
which is not allowed.
12
Step 1: Tokenzier
 Split doc into words
 Remove the punctuation
 Remove stop word (the, a, this, that etc.)
“Students”,“allowed”,“go”,“their”,
“friends”,“allowed”,“drink”,“beer”,“My”,
“friend”,“Jerry”,“went”,“school”,“see”,
“his”,“students”,“found”,“them”,“drunk”,
“allowed”
13
Step2: Linguistic Processor
 Lowercase
 Stemming, cars -> car, etc.
 Lemmatizatio, drove -> drive, etc.
“student”,“allow”,“go”,“their”,“friend”
,“allow”,“drink”,“beer”,“my”,“friend”
,“jerry”,“go”,“school”,“see”,“his”,
“student”,“find”,“them”,“drink”,“allow”
Term
14
Step3: Index
Term Document ID
student 1
allow 1
go 1
their 1
friend 1
allow 1
… …
 Dict
 Sort
 Posting list
15
16
How to
the data ?
SEARCH
17
Step1: User search query
• Suppose you have the follow query:
lucene AND learned NOT hadoop
18
Step2: Lexical & Syntax Analysis
 Identify words and keywords
 Words: lucene, learned, hadoop
 Keywords: AND, NOT
 Building a syntax tree
lucene learned
hadoopAND
Not
19
Step3: Search
 Search in the Inverted List
 Sort, Conjunction, Disconjunction
 Scorer
20
full text search
RESTful API
real time,
Search and
analytics engine
open source
high availability
schema free
JSON over HTTP
Lucene based
distributed
RESTful API
ElasticSearch
21
Elastic Search
 Distributed and Highly Available Search Engine.
 Each index is fully sharded with a configurable number of shards.
 Each shard can have one or more replicas.
 Read / Search operations performed on either one of the replica shard.
 Multi Tenant with Multi Types.
 Support for more than one index.
 Support for more than one type per index.
 Index level configuration (number of shards, index storage, ...).
 Document oriented
 No need for upfront schema definition.
 Schema can be defined per type for customization of the indexing process.
 Various set of APIs
 HTTP RESTful API
 Native Java API.
 All APIs perform automatic node operation rerouting.
 (Near) Real Time Search.
 Reliable, Asynchronous Write Behind for long term persistency.
 Built on top of Lucene
 Each shard is a fully functional Lucene index
 All the power of Lucene easily exposed through simple configuration / plugins.
 Per operation consistency
 Single document level operations are atomic, consistent, isolated and durable.
 Open Source under the Apache License, version 2 ("ALv2")
22
Terminologies of Elastic Search
 Cluster
 Node
 Index
 Shard
23
Cluster
● A cluster is a collection of one or more
nodes (servers) that together holds your
entire data and provides federated indexing
and search capabilities across all nodes
● A cluster is identified by a unique name
which by default is "elasticsearch"
Terminologies of Elastic Search
24
Node
● It is an elasticsearch instance (a java process)
● A node is created when a elasticsearch instance is
started
● A random Marvel Charater name is allocated by
default
Terminologies of Elastic Search
25
Index
● An index is a collection of documents that have
somewhat similar characteristics. eg:customer data,
product catalog
● Very crucial while performing indexing, search, update,
and delete operations against the documents in it
● One can define as many indexes in one single cluster
Terminologies of Elastic Search
26
Document
● It is the most basic unit of information which can be
indexed
● It is expressed in json (key:value) pair.
‘{“user”:”nullcon”}’
● Every Document gets associated with a type and a
unique id.
Terminologies of Elastic Search
27
Shard
● Every index can be split into multiple shards to
be able to distribute data.
● The shard is the atomic part of an index, which
can be distributed over the cluster if you add
more nodes.
Terminologies of Elastic Search
28
29
30
A terminology comparison
Relational database Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Index Everything is indexed
SQL Query DSL
SELECT * FROm tb … GET http://
UPDATE tb SET … PUT http://
31
Playing with Elasticsearch
REST API:
http://host:port/[index]/[type]/[_action/
id]
HTTP Methods: GET, POST,PUT,DELETE
32
Playing with Elasticsearch
• Search
– curl –XGET http://localhost:9200/my_index/test/_search
– curl –XGET http://localhost:9200/my_index/_search
– curl –XPUT http://localhost:9200/_search
• Meta Data
– curl –XPUT http://localhost:9200/my_index/_status
• Documents:
– curl –XPUT http://localhost:9200/my_index/test/1
– curl –XGET http://localhost:9200/my_index/test/1
– curl –XDELETE http://localhost:9200/my_index/test/1
33
Example: Index
Curl –XPUT http://localhost:9200/my_index/test/1 -d
‘{
"name": "joeywen",
"value": 100
}’
34
Example: Search
Curl –XGET http://localhost:9200/my_index/_search –d
‘{
“query”: {
“match_all”: {}
}
}’
Total number of docs
Relevance
Search time
Max score
35
Creating, indexing, or deleting a single document
36
Plugins-Kopf
37
Plugins-head
38
Web
39
40

More Related Content

What's hot

What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Fundamental of ELK Stack
Fundamental of ELK StackFundamental of ELK Stack
Fundamental of ELK Stack주표 홍
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 

What's hot (20)

What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Fundamental of ELK Stack
Fundamental of ELK StackFundamental of ELK Stack
Fundamental of ELK Stack
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 

Similar to Intro to elasticsearch

Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Explore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingExplore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingInexture Solutions
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
A Review of Elastic Search: Performance Metrics and challenges
A Review of Elastic Search: Performance Metrics and challengesA Review of Elastic Search: Performance Metrics and challenges
A Review of Elastic Search: Performance Metrics and challengesrahulmonikasharma
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 

Similar to Intro to elasticsearch (20)

Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Explore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingExplore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth Using
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Elastic search
Elastic searchElastic search
Elastic search
 
A Review of Elastic Search: Performance Metrics and challenges
A Review of Elastic Search: Performance Metrics and challengesA Review of Elastic Search: Performance Metrics and challenges
A Review of Elastic Search: Performance Metrics and challenges
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Lucene
LuceneLucene
Lucene
 

Recently uploaded

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwoodseandesed
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfDanfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfNurvisNavarroSanchez
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionjeevanprasad8
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringC Sai Kiran
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...Amil baba
 

Recently uploaded (20)

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfDanfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projection
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
 

Intro to elasticsearch

  • 1. Your Data, Your Search ! 问志光 2016-06-27 1
  • 2. Outline  Information retrieval  Indexing & Searching  Elasticsearch 2
  • 3. Information retrieval  Information Retrieval(IR) is finding material(usually documents) of an unstructured nature(usually text) that statisfies an information need from within large collections(usually stored on computers).  Search Engine is a software system that is designed to search for information. It’s a kind of implementation of IR. 3
  • 4. What is search engine?  A search engine is  An index engine for documents  A search engine on indexes  A search engine is more powerful to do searches: It’s designed for it ! 4
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. Problems ??  How to store the data ?  How to index the data ?  How to search the data ? 9
  • 10. How to store the data ? INVERTED LIST 10
  • 11. How to the data ? INDEX 11
  • 12. the follow two files  File1: Students should be allowed to go out with their friends, but not allowed to drink beer.  File2: My friend Jerry went to school to see his students but found them drunk which is not allowed. 12
  • 13. Step 1: Tokenzier  Split doc into words  Remove the punctuation  Remove stop word (the, a, this, that etc.) “Students”,“allowed”,“go”,“their”, “friends”,“allowed”,“drink”,“beer”,“My”, “friend”,“Jerry”,“went”,“school”,“see”, “his”,“students”,“found”,“them”,“drunk”, “allowed” 13
  • 14. Step2: Linguistic Processor  Lowercase  Stemming, cars -> car, etc.  Lemmatizatio, drove -> drive, etc. “student”,“allow”,“go”,“their”,“friend” ,“allow”,“drink”,“beer”,“my”,“friend” ,“jerry”,“go”,“school”,“see”,“his”, “student”,“find”,“them”,“drink”,“allow” Term 14
  • 15. Step3: Index Term Document ID student 1 allow 1 go 1 their 1 friend 1 allow 1 … …  Dict  Sort  Posting list 15
  • 16. 16
  • 17. How to the data ? SEARCH 17
  • 18. Step1: User search query • Suppose you have the follow query: lucene AND learned NOT hadoop 18
  • 19. Step2: Lexical & Syntax Analysis  Identify words and keywords  Words: lucene, learned, hadoop  Keywords: AND, NOT  Building a syntax tree lucene learned hadoopAND Not 19
  • 20. Step3: Search  Search in the Inverted List  Sort, Conjunction, Disconjunction  Scorer 20
  • 21. full text search RESTful API real time, Search and analytics engine open source high availability schema free JSON over HTTP Lucene based distributed RESTful API ElasticSearch 21
  • 22. Elastic Search  Distributed and Highly Available Search Engine.  Each index is fully sharded with a configurable number of shards.  Each shard can have one or more replicas.  Read / Search operations performed on either one of the replica shard.  Multi Tenant with Multi Types.  Support for more than one index.  Support for more than one type per index.  Index level configuration (number of shards, index storage, ...).  Document oriented  No need for upfront schema definition.  Schema can be defined per type for customization of the indexing process.  Various set of APIs  HTTP RESTful API  Native Java API.  All APIs perform automatic node operation rerouting.  (Near) Real Time Search.  Reliable, Asynchronous Write Behind for long term persistency.  Built on top of Lucene  Each shard is a fully functional Lucene index  All the power of Lucene easily exposed through simple configuration / plugins.  Per operation consistency  Single document level operations are atomic, consistent, isolated and durable.  Open Source under the Apache License, version 2 ("ALv2") 22
  • 23. Terminologies of Elastic Search  Cluster  Node  Index  Shard 23
  • 24. Cluster ● A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes ● A cluster is identified by a unique name which by default is "elasticsearch" Terminologies of Elastic Search 24
  • 25. Node ● It is an elasticsearch instance (a java process) ● A node is created when a elasticsearch instance is started ● A random Marvel Charater name is allocated by default Terminologies of Elastic Search 25
  • 26. Index ● An index is a collection of documents that have somewhat similar characteristics. eg:customer data, product catalog ● Very crucial while performing indexing, search, update, and delete operations against the documents in it ● One can define as many indexes in one single cluster Terminologies of Elastic Search 26
  • 27. Document ● It is the most basic unit of information which can be indexed ● It is expressed in json (key:value) pair. ‘{“user”:”nullcon”}’ ● Every Document gets associated with a type and a unique id. Terminologies of Elastic Search 27
  • 28. Shard ● Every index can be split into multiple shards to be able to distribute data. ● The shard is the atomic part of an index, which can be distributed over the cluster if you add more nodes. Terminologies of Elastic Search 28
  • 29. 29
  • 30. 30
  • 31. A terminology comparison Relational database Elasticsearch Database Index Table Type Row Document Column Field Schema Mapping Index Everything is indexed SQL Query DSL SELECT * FROm tb … GET http:// UPDATE tb SET … PUT http:// 31
  • 32. Playing with Elasticsearch REST API: http://host:port/[index]/[type]/[_action/ id] HTTP Methods: GET, POST,PUT,DELETE 32
  • 33. Playing with Elasticsearch • Search – curl –XGET http://localhost:9200/my_index/test/_search – curl –XGET http://localhost:9200/my_index/_search – curl –XPUT http://localhost:9200/_search • Meta Data – curl –XPUT http://localhost:9200/my_index/_status • Documents: – curl –XPUT http://localhost:9200/my_index/test/1 – curl –XGET http://localhost:9200/my_index/test/1 – curl –XDELETE http://localhost:9200/my_index/test/1 33
  • 34. Example: Index Curl –XPUT http://localhost:9200/my_index/test/1 -d ‘{ "name": "joeywen", "value": 100 }’ 34
  • 35. Example: Search Curl –XGET http://localhost:9200/my_index/_search –d ‘{ “query”: { “match_all”: {} } }’ Total number of docs Relevance Search time Max score 35
  • 36. Creating, indexing, or deleting a single document 36
  • 40. 40