SlideShare a Scribd company logo
1 of 49
엘라스틱서치
적합성
이해하기
Moon Yong Joon
용어 이해 1
Relevance와 Analysis를 명확히 구분이 필요
Relevance
Analysis
주어진 쿼리에 얼마나 관련하여 결
과를 평가하는 능력
관련성은 TF/ IDF를 사용하여 계산
별개 정규화 토큰으로 텍스트 블록
을 변환하는 과정
용어 이해 2
Query에 대한 구분이 필요
Term based
query
Full text
query
term or fuzzy queries 같은 low-
level queries 이며 single term을
처리하지만 analysis phase를 가지
지 않음
match or query_string queries 같
은 high-level queries
실행 절차 : match query 기준
Query에 대한 실행 절차는 4단계로 처리
Check the field type.
Analyze the query
string.
Find matching docs.
Score each doc.
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": "QUICK!"
}
}
}
"hits": [
{
"_id": "1",
"_score": 0.5,
"_source": {
"title": "The quick brown fox"
}
},
{
"_id": "3",
"_score": 0.44194174,
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "2",
"_score": 0.3125,
"_source": {
"title": "The quick brown fox jumps over the lazy dog"
}
}
]
SCORE
Moon Yong Joon
Explain 보는 법
질의 후 explain 명령
하나의 질의를 할 경우 explain을 주고 검색해야
함
GET /_search?explain
{
"query" : { "match" : { "tweet" : "honeymoon" }}
}
Explain을 지
정해야 함
Query 질의 결과 보기
하나의 질의를 할 경우 계산하는 법
"_explanation": {
"description": "weight(tweet:honeymoon in 0)
[PerFieldSimilarity], result of:",
"value": 0.076713204,
"details": [
{
"description": "fieldWeight in 0, product of:",
"value": 0.076713204,
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"value": 1,
"details": [
{
"description": "termFreq=1.0",
"value": 1
}
]
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"value": 0.25,
}
]
}
]
}
질의에 대한
계산식
질의에 대한
총 score 값
질의에 대한
세부 score 값
Score 계산 산식
Score 계산 산식 1
스코어 계산 산식
score(q,d) =
queryNorm(q)
coord(q,d)
SUM (
tf(t in d),
idf(t)²,
t.getBoost(),
norm(t,d)
) (t in q)
Score 계산 산식 상세
스코어 계산 산식에 대한 상세
score(q,d) score(q,d) is the relevance score of document d for query q.
queryNorm(q) queryNorm(q) is the query normalization factor
queryNorm = 1 / sqrt(sumOfSquaredWeights)
coord(q,d) coord(q,d) is the coordination factor
∑(t in q) The sum of the weights for each term t in the query q for document d.
tf(t in d) tf(t in d) is the term frequency for term t in document d.
tf = sqrt(termFreq)
idf(t) idf(t) is the inverse document frequency for term t.
idf = 1 + ln(maxDocs/(docFreq + 1))
t.getBoos
t()
t.getBoost() is the boost that has been applied to the query
norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel
d-level boost, if any.
norm = 1/sqrt(numFieldTerms)
Score 계산 예시
Query 질의에 대한 score
하나의 질의를 할 경우 계산하는 법
curl -XGET 'https://aws-us-east-1-
portal10.dblayer.com:10019/top_films/film/172/_explain?pretty=1' -d '
{
"query" : {
"match" : {
"title" : "life"
}
}
}
queryWeight
idf(docFreq=2, maxDocs=50) *
queryNorm = queryWeight
{
"description" : "queryWeight, product of:",
"value" : 0.999999940000001,
"details" : [
{
"description" : "idf(docFreq=2, maxDocs=50)",
"value" : 3.8134108
},
{
"value" : 0.26223242,
"description" : "queryNorm"
}
]
},
coordination factor
질의에 대한 조정 계수
The more query terms that appear in the document, the
greater the chances that the document is a good match for
the query.
Document with fox → score: 1.5
Document with quick fox → score: 3.0
Document with quick brown fox → score: 4.5
Document with fox → score: 1.5 * 1 / 3 = 0.5
Document with quick fox → score: 3.0 * 2 / 3
= 2.0
Document with quick brown fox → score: 4.5 *
3 / 3 = 4.5
coordination factor
조정계수 질의 예시
GET /_search
{
"query": {
"bool": {
"should": [
{ "term": { "text": "quick" }},
{ "term": { "text": "brown" }},
{ "term": { "text": "fox" }}
]
}
}
}
fieldWeight
tf(freq=1.0)* idf(docFreq=2,
maxDocs=50)* fieldNorm(doc=38)
{
"description" : "fieldWeight in 38, product of:",
"value" : 1.9067054,
"details" : [
{
"description" : "tf(freq=1.0), with freq of:",
"details" : [
{
"value" : 1,
"description" : "termFreq=1.0"
}
],
"value" : 1
},
{
"value" : 3.8134108,
"description" : "idf(docFreq=2, maxDocs=50)"
},
{
"value" : 0.5,
"description" : "fieldNorm(doc=38)"
}
]
}
],
score
queryWeight * fieldWeight
{
"value" : 1.9067053,
"description" : "score(doc=38,freq=1.0), product of:“
}
하나 필드 Score 처리 예시
Score 계산 산식
스코어 계산 산식에 대한 상세
score(q,d) score(q,d) is the relevance score of document d for query q.
∑(t in q) The sum of the weights for each term t in the query q for document d.
tf(t in d) tf(t in d) is the term frequency for term t in document d.
tf = sqrt(termFreq)
idf(t) idf(t) is the inverse document frequency for term t.
idf = 1 + ln(maxDocs/(docFreq + 1))
t.getBoos
t()
t.getBoost() is the boost that has been applied to the query
norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel
d-level boost, if any.
norm = 1/sqrt(numFieldTerms)
Similarity 알고리즘
sqrt(tf) * idf * fln * boost(사용자지정값)를 사
용해서 score 값을 계산
TF
IDF
FLN
Term frequency : 특정 단어(term)이 이 문서에 얼마나 많이
나오는지?
tf = sqrt(termFreq)
Inverse document frequency : index 내의 모든 문서 내의
필드에 이 단어(term)이 많이 나오는지?
idf = 1 + ln(maxDocs/(docFreq + 1))
Field-length norm : 이 단어(term)이 있는 필드의 길이? 이
필드가 길면 점수도 낮아진다.
norm = 1/sqrt(numFieldTerms)
특정 필드 검색 및 설명
실제 필드에 매칭되는 값을 검색하고 score 계산
결과를 확인
특정 필드 검색결과
big에 매칭되는 결과 조회
특정 필드 score 설명
TF, IDF, FLN에 대한 값을 표시
TF IDF FLN
* *
0.8784157 = 1.0 * 1.4054651 * 0.625
big/data 두개 가진 필드 score
동일한 질의
big과 data에 대한 term 단위의 질의로 인식
{
"query": {
"match": {
"title": “big data"
}
}
}
{
"query": {
"bool": {
"should": [
{ "term": { "title": "big" }},
{ "term": { "title": "data" }}
]
}
}
}
Score 계산 산식 상세
스코어 계산 산식에 대한 상세
score(q,d) score(q,d) is the relevance score of document d for query q.
queryNorm(q) queryNorm(q) is the query normalization factor
queryNorm = 1 / sqrt(sumOfSquaredWeights)
coord(q,d) 둘다 해당되므로 무시 됨
∑(t in q) The sum of the weights for each term t in the query q for document d.
tf(t in d) tf(t in d) is the term frequency for term t in document d.
tf = sqrt(termFreq)
idf(t) idf(t) is the inverse document frequency for term t.
idf = 1 + ln(maxDocs/(docFreq + 1))
t.getBoos
t()
t.getBoost() is the boost that has been applied to the query
norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel
d-level boost, if any.
norm = 1/sqrt(numFieldTerms)
특정 필드 검색 (big,data)
big data를 다 가진 경우는 coordination factor
가 존재하지 않음
Title :Big data score
big data score = big score + data score
0.883883 = 0.44194174+ 0.44194174
max_score" : 0.8838835,
"hits" : [ {
"_shard" : 3,
"_node" : "LhufT5nGQPmrhEFEwV8-Cw",
"_index" : "books",
"_type" : "itbook",
"_id" : "1",
"_score" : 0.8838835,
"_source" : {
"title" : "big data",
"author" : [ "hwang", "kang" ],
"price" : 30000,
"pages" : 300
},
"_explanation" : {
"value" : 0.8838835,
"description" : "sum of:"
big : fieldWeight
fieldWeight = tf * idf * fieldnorm
{
"value" : 0.625,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)",
"details" : [ ]
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)",
"details" : [ ]
}
}
big : queryWeight
queryWeight = idf(docFreq=1,
maxDocs=2)“ * queryNorm
{
"value" : 0.70710677,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)",
"details" : [ ]
}, {
"value" : 0.70710677,
"description" : "queryNorm",
"details" : [ ]
}
}
big : score
big score = queryWeight * fieldWeight
0.44194174 = 0.70710677 * 0.625
"value" : 0.44194174,
"description" : "weight(title:big in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.44194174,
"description" : "score(doc=0,freq=1.0), product of:",
data : fieldWeight
fieldWeight = tf * idf * fieldnorm
{
"value" : 0.625,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)",
"details" : [ ]
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)",
"details" : [ ]
}
data : queryWeight
queryWeight = idf(docFreq=1,
maxDocs=2)“ * queryNorm
{
"value" : 0.70710677,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)",
"details" : [ ]
}, {
"value" : 0.70710677,
"description" : "queryNorm",
"details" : [ ]
}
}
data : score
big score = queryWeight * fieldWeight
0.44194174 = 0.70710677 * 0.625
"value" : 0.44194174,
"description" : "weight(title:data in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.44194174,
"description" : "score(doc=0,freq=1.0), product of:"
big 값만 가진 필드 계산
Score 계산 산식 상세
스코어 계산 산식에 대한 상세
score(q,d) score(q,d) is the relevance score of document d for query q.
queryNorm(q) queryNorm(q) is the query normalization factor
queryNorm = 1 / sqrt(sumOfSquaredWeights)
coord(q,d) coord(q,d) is the coordination factor
∑(t in q) The sum of the weights for each term t in the query q for document d.
tf(t in d) tf(t in d) is the term frequency for term t in document d.
tf = sqrt(termFreq)
idf(t) idf(t) is the inverse document frequency for term t.
idf = 1 + ln(maxDocs/(docFreq + 1))
t.getBoos
t()
t.getBoost() is the boost that has been applied to the query
norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel
d-level boost, if any.
norm = 1/sqrt(numFieldTerms)
Title :big picture score
big data score = big score + data score
0.883883 = 0.44194174+ 0.44194174
max_score" : 0.8838835,
"hits" : [ {
"_shard" : 3,
"_node" : "LhufT5nGQPmrhEFEwV8-Cw",
"_index" : "books",
"_type" : "itbook",
"_id" : "1",
"_score" : 0.8838835,
"_source" : {
"title" : "big data",
"author" : [ "hwang", "kang" ],
"price" : 30000,
"pages" : 300
},
"_explanation" : {
"value" : 0.8838835,
"description" : "sum of:"
big : fieldWeight
fieldWeight = tf * idf * fieldnorm
{
"value" : 0.8784157,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
} ]
}, {
"value" : 1.4054651,
"description" : "idf(docFreq=1, maxDocs=3)",
"details" : [ ]
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)",
"details" : [ ]
}
}
big : queryWeight
queryWeight = idf(docFreq=1,
maxDocs=2)“ * queryNorm
{
{
"value" : 0.5564505,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 1.4054651,
"description" : "idf(docFreq=1, maxDocs=3)",
"details" : [ ]
}, {
"value" : 0.3959191,
"description" : "queryNorm",
"details" : [ ]
} ]
}
big : score
big score = queryWeight * fieldWeight
0.48879483 = 0.5564505 * 0.8784157
details" : [ {
"value" : 0.48879483,
"description" : "sum of:",
"details" : [ {
"value" : 0.48879483,
"description" : "weight(title:big in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.48879483,
"description" : "score(doc=0,freq=1.0), product of:",
big : coord
coord(1/2)
{
"value" : 0.5,
"description" : "coord(1/2)",
"details" : [ ]
}
big picture: score
big score = big score * coord
0.24439742 = 0.48879483 * 0.5
"value" : 0.24439742,
"description" : "product of:"
쿼리가중치
(BOOST)
Moon Yong Joon
query time
쿼리 검색 설명
Title 필드로 2가지 조건을 검색할 경우
Boost 계산이 2개이상이
있을 경우 계산됨
Query 검색결과
big에 매칭되는 결과 조회
검색결과값 = 쿼리가중치 * 필드가중치
0.78567886 = 0.8944272 * 0.8784157
최종값 = 검색결과값/(1/쿼리갯수)
0.39283943 = 0.78567886*0.5
쿼리 weight 설명
TF, IDF, FLN에 대한 값을 표시
boost IDF
Query
Norm* *
0.8944272 = 2.0 * 1.4054651 * 0.31819615
필드 weight 설명
TF, IDF, FLN에 대한 값을 표시
TF IDF FLN
* *
0.8784157 = 1.0 * 1.4054651 * 0.625

More Related Content

What's hot

Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring DataOliver Gierke
 
concurrency with GPars
concurrency with GParsconcurrency with GPars
concurrency with GParsPaul King
 
groovy databases
groovy databasesgroovy databases
groovy databasesPaul King
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012Timo Westkämper
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on AndroidSven Haiges
 
Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Hermann Hueck
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Oliver Gierke
 
Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDBScott Hernandez
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
Python basic
Python basic Python basic
Python basic sewoo lee
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP formatForest Mars
 
11. session 11 functions and objects
11. session 11   functions and objects11. session 11   functions and objects
11. session 11 functions and objectsPhúc Đỗ
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Eelco Visser
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2Zaar Hai
 

What's hot (20)

Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Python lec4
Python lec4Python lec4
Python lec4
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
concurrency with GPars
concurrency with GParsconcurrency with GPars
concurrency with GPars
 
groovy databases
groovy databasesgroovy databases
groovy databases
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on Android
 
Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Poly-paradigm Java
Poly-paradigm JavaPoly-paradigm Java
Poly-paradigm Java
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
Python basic
Python basic Python basic
Python basic
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP format
 
11. session 11 functions and objects
11. session 11   functions and objects11. session 11   functions and objects
11. session 11 functions and objects
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
 

Similar to 엘라스틱서치 적합성 이해하기 20160630

Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTdiannepatricia
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with javaGary Sieling
 
Slides
SlidesSlides
Slidesbutest
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Brian Nauheimer
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2Bahul Neel Upadhyaya
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Johan Blomme
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29Bilal Ahmed
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Text Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningText Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningMeghaj Mallick
 

Similar to 엘라스틱서치 적합성 이해하기 20160630 (20)

Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Ir models
Ir modelsIr models
Ir models
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with java
 
Slides
SlidesSlides
Slides
 
Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020Elastic Relevance Presentation feb4 2020
Elastic Relevance Presentation feb4 2020
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Text Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningText Mining of Twitter in Data Mining
Text Mining of Twitter in Data Mining
 

More from Yong Joon Moon

Scala companion object
Scala companion objectScala companion object
Scala companion objectYong Joon Moon
 
Scala block expression
Scala block expressionScala block expression
Scala block expressionYong Joon Moon
 
Scala self type inheritance
Scala self type inheritanceScala self type inheritance
Scala self type inheritanceYong Joon Moon
 
Scala type class pattern
Scala type class patternScala type class pattern
Scala type class patternYong Joon Moon
 
Scala nested function generic function
Scala nested function generic functionScala nested function generic function
Scala nested function generic functionYong Joon Moon
 
스칼라 클래스 이해하기 _Scala class understanding
스칼라 클래스 이해하기 _Scala class understanding스칼라 클래스 이해하기 _Scala class understanding
스칼라 클래스 이해하기 _Scala class understandingYong Joon Moon
 
파이썬 반복자 생성자 이해하기
파이썬 반복자 생성자 이해하기파이썬 반복자 생성자 이해하기
파이썬 반복자 생성자 이해하기Yong Joon Moon
 
파이썬 프로퍼티 디스크립터 이해하기
파이썬 프로퍼티 디스크립터 이해하기파이썬 프로퍼티 디스크립터 이해하기
파이썬 프로퍼티 디스크립터 이해하기Yong Joon Moon
 
파이썬 문자열 이해하기
파이썬 문자열 이해하기파이썬 문자열 이해하기
파이썬 문자열 이해하기Yong Joon Moon
 
파이썬 플라스크 이해하기
파이썬 플라스크 이해하기 파이썬 플라스크 이해하기
파이썬 플라스크 이해하기 Yong Joon Moon
 
파이썬 내부 데이터 검색 방법
파이썬 내부 데이터 검색 방법파이썬 내부 데이터 검색 방법
파이썬 내부 데이터 검색 방법Yong Joon Moon
 

More from Yong Joon Moon (20)

rust ownership
rust ownership rust ownership
rust ownership
 
Scala namespace scope
Scala namespace scopeScala namespace scope
Scala namespace scope
 
Scala companion object
Scala companion objectScala companion object
Scala companion object
 
Scala block expression
Scala block expressionScala block expression
Scala block expression
 
Scala self type inheritance
Scala self type inheritanceScala self type inheritance
Scala self type inheritance
 
Scala variable
Scala variableScala variable
Scala variable
 
Scala type class pattern
Scala type class patternScala type class pattern
Scala type class pattern
 
Scala match pattern
Scala match patternScala match pattern
Scala match pattern
 
Scala implicit
Scala implicitScala implicit
Scala implicit
 
Scala type args
Scala type argsScala type args
Scala type args
 
Scala trait usage
Scala trait usageScala trait usage
Scala trait usage
 
Scala nested function generic function
Scala nested function generic functionScala nested function generic function
Scala nested function generic function
 
Scala dir processing
Scala dir processingScala dir processing
Scala dir processing
 
Scala syntax function
Scala syntax functionScala syntax function
Scala syntax function
 
스칼라 클래스 이해하기 _Scala class understanding
스칼라 클래스 이해하기 _Scala class understanding스칼라 클래스 이해하기 _Scala class understanding
스칼라 클래스 이해하기 _Scala class understanding
 
파이썬 반복자 생성자 이해하기
파이썬 반복자 생성자 이해하기파이썬 반복자 생성자 이해하기
파이썬 반복자 생성자 이해하기
 
파이썬 프로퍼티 디스크립터 이해하기
파이썬 프로퍼티 디스크립터 이해하기파이썬 프로퍼티 디스크립터 이해하기
파이썬 프로퍼티 디스크립터 이해하기
 
파이썬 문자열 이해하기
파이썬 문자열 이해하기파이썬 문자열 이해하기
파이썬 문자열 이해하기
 
파이썬 플라스크 이해하기
파이썬 플라스크 이해하기 파이썬 플라스크 이해하기
파이썬 플라스크 이해하기
 
파이썬 내부 데이터 검색 방법
파이썬 내부 데이터 검색 방법파이썬 내부 데이터 검색 방법
파이썬 내부 데이터 검색 방법
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

엘라스틱서치 적합성 이해하기 20160630

  • 2. 용어 이해 1 Relevance와 Analysis를 명확히 구분이 필요 Relevance Analysis 주어진 쿼리에 얼마나 관련하여 결 과를 평가하는 능력 관련성은 TF/ IDF를 사용하여 계산 별개 정규화 토큰으로 텍스트 블록 을 변환하는 과정
  • 3. 용어 이해 2 Query에 대한 구분이 필요 Term based query Full text query term or fuzzy queries 같은 low- level queries 이며 single term을 처리하지만 analysis phase를 가지 지 않음 match or query_string queries 같 은 high-level queries
  • 4. 실행 절차 : match query 기준 Query에 대한 실행 절차는 4단계로 처리 Check the field type. Analyze the query string. Find matching docs. Score each doc. GET /my_index/my_type/_search { "query": { "match": { "title": "QUICK!" } } } "hits": [ { "_id": "1", "_score": 0.5, "_source": { "title": "The quick brown fox" } }, { "_id": "3", "_score": 0.44194174, "_source": { "title": "The quick brown fox jumps over the quick dog" } }, { "_id": "2", "_score": 0.3125, "_source": { "title": "The quick brown fox jumps over the lazy dog" } } ]
  • 7. 질의 후 explain 명령 하나의 질의를 할 경우 explain을 주고 검색해야 함 GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }} } Explain을 지 정해야 함
  • 8. Query 질의 결과 보기 하나의 질의를 할 경우 계산하는 법 "_explanation": { "description": "weight(tweet:honeymoon in 0) [PerFieldSimilarity], result of:", "value": 0.076713204, "details": [ { "description": "fieldWeight in 0, product of:", "value": 0.076713204, "details": [ { "description": "tf(freq=1.0), with freq of:", "value": 1, "details": [ { "description": "termFreq=1.0", "value": 1 } ] }, { "description": "idf(docFreq=1, maxDocs=1)", "value": 0.30685282 }, { "description": "fieldNorm(doc=0)", "value": 0.25, } ] } ] } 질의에 대한 계산식 질의에 대한 총 score 값 질의에 대한 세부 score 값
  • 10. Score 계산 산식 1 스코어 계산 산식 score(q,d) = queryNorm(q) coord(q,d) SUM ( tf(t in d), idf(t)², t.getBoost(), norm(t,d) ) (t in q)
  • 11. Score 계산 산식 상세 스코어 계산 산식에 대한 상세 score(q,d) score(q,d) is the relevance score of document d for query q. queryNorm(q) queryNorm(q) is the query normalization factor queryNorm = 1 / sqrt(sumOfSquaredWeights) coord(q,d) coord(q,d) is the coordination factor ∑(t in q) The sum of the weights for each term t in the query q for document d. tf(t in d) tf(t in d) is the term frequency for term t in document d. tf = sqrt(termFreq) idf(t) idf(t) is the inverse document frequency for term t. idf = 1 + ln(maxDocs/(docFreq + 1)) t.getBoos t() t.getBoost() is the boost that has been applied to the query norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel d-level boost, if any. norm = 1/sqrt(numFieldTerms)
  • 13. Query 질의에 대한 score 하나의 질의를 할 경우 계산하는 법 curl -XGET 'https://aws-us-east-1- portal10.dblayer.com:10019/top_films/film/172/_explain?pretty=1' -d ' { "query" : { "match" : { "title" : "life" } } }
  • 14. queryWeight idf(docFreq=2, maxDocs=50) * queryNorm = queryWeight { "description" : "queryWeight, product of:", "value" : 0.999999940000001, "details" : [ { "description" : "idf(docFreq=2, maxDocs=50)", "value" : 3.8134108 }, { "value" : 0.26223242, "description" : "queryNorm" } ] },
  • 15. coordination factor 질의에 대한 조정 계수 The more query terms that appear in the document, the greater the chances that the document is a good match for the query. Document with fox → score: 1.5 Document with quick fox → score: 3.0 Document with quick brown fox → score: 4.5 Document with fox → score: 1.5 * 1 / 3 = 0.5 Document with quick fox → score: 3.0 * 2 / 3 = 2.0 Document with quick brown fox → score: 4.5 * 3 / 3 = 4.5
  • 16. coordination factor 조정계수 질의 예시 GET /_search { "query": { "bool": { "should": [ { "term": { "text": "quick" }}, { "term": { "text": "brown" }}, { "term": { "text": "fox" }} ] } } }
  • 17. fieldWeight tf(freq=1.0)* idf(docFreq=2, maxDocs=50)* fieldNorm(doc=38) { "description" : "fieldWeight in 38, product of:", "value" : 1.9067054, "details" : [ { "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1, "description" : "termFreq=1.0" } ], "value" : 1 }, { "value" : 3.8134108, "description" : "idf(docFreq=2, maxDocs=50)" }, { "value" : 0.5, "description" : "fieldNorm(doc=38)" } ] } ],
  • 18. score queryWeight * fieldWeight { "value" : 1.9067053, "description" : "score(doc=38,freq=1.0), product of:“ }
  • 19. 하나 필드 Score 처리 예시
  • 20. Score 계산 산식 스코어 계산 산식에 대한 상세 score(q,d) score(q,d) is the relevance score of document d for query q. ∑(t in q) The sum of the weights for each term t in the query q for document d. tf(t in d) tf(t in d) is the term frequency for term t in document d. tf = sqrt(termFreq) idf(t) idf(t) is the inverse document frequency for term t. idf = 1 + ln(maxDocs/(docFreq + 1)) t.getBoos t() t.getBoost() is the boost that has been applied to the query norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel d-level boost, if any. norm = 1/sqrt(numFieldTerms)
  • 21. Similarity 알고리즘 sqrt(tf) * idf * fln * boost(사용자지정값)를 사 용해서 score 값을 계산 TF IDF FLN Term frequency : 특정 단어(term)이 이 문서에 얼마나 많이 나오는지? tf = sqrt(termFreq) Inverse document frequency : index 내의 모든 문서 내의 필드에 이 단어(term)이 많이 나오는지? idf = 1 + ln(maxDocs/(docFreq + 1)) Field-length norm : 이 단어(term)이 있는 필드의 길이? 이 필드가 길면 점수도 낮아진다. norm = 1/sqrt(numFieldTerms)
  • 22. 특정 필드 검색 및 설명 실제 필드에 매칭되는 값을 검색하고 score 계산 결과를 확인
  • 23. 특정 필드 검색결과 big에 매칭되는 결과 조회
  • 24. 특정 필드 score 설명 TF, IDF, FLN에 대한 값을 표시 TF IDF FLN * * 0.8784157 = 1.0 * 1.4054651 * 0.625
  • 25. big/data 두개 가진 필드 score
  • 26. 동일한 질의 big과 data에 대한 term 단위의 질의로 인식 { "query": { "match": { "title": “big data" } } } { "query": { "bool": { "should": [ { "term": { "title": "big" }}, { "term": { "title": "data" }} ] } } }
  • 27. Score 계산 산식 상세 스코어 계산 산식에 대한 상세 score(q,d) score(q,d) is the relevance score of document d for query q. queryNorm(q) queryNorm(q) is the query normalization factor queryNorm = 1 / sqrt(sumOfSquaredWeights) coord(q,d) 둘다 해당되므로 무시 됨 ∑(t in q) The sum of the weights for each term t in the query q for document d. tf(t in d) tf(t in d) is the term frequency for term t in document d. tf = sqrt(termFreq) idf(t) idf(t) is the inverse document frequency for term t. idf = 1 + ln(maxDocs/(docFreq + 1)) t.getBoos t() t.getBoost() is the boost that has been applied to the query norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel d-level boost, if any. norm = 1/sqrt(numFieldTerms)
  • 28. 특정 필드 검색 (big,data) big data를 다 가진 경우는 coordination factor 가 존재하지 않음
  • 29. Title :Big data score big data score = big score + data score 0.883883 = 0.44194174+ 0.44194174 max_score" : 0.8838835, "hits" : [ { "_shard" : 3, "_node" : "LhufT5nGQPmrhEFEwV8-Cw", "_index" : "books", "_type" : "itbook", "_id" : "1", "_score" : 0.8838835, "_source" : { "title" : "big data", "author" : [ "hwang", "kang" ], "price" : 30000, "pages" : 300 }, "_explanation" : { "value" : 0.8838835, "description" : "sum of:"
  • 30. big : fieldWeight fieldWeight = tf * idf * fieldnorm { "value" : 0.625, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0", "details" : [ ] } ] }, { "value" : 1.0, "description" : "idf(docFreq=1, maxDocs=2)", "details" : [ ] }, { "value" : 0.625, "description" : "fieldNorm(doc=0)", "details" : [ ] } }
  • 31. big : queryWeight queryWeight = idf(docFreq=1, maxDocs=2)“ * queryNorm { "value" : 0.70710677, "description" : "queryWeight, product of:", "details" : [ { "value" : 1.0, "description" : "idf(docFreq=1, maxDocs=2)", "details" : [ ] }, { "value" : 0.70710677, "description" : "queryNorm", "details" : [ ] } }
  • 32. big : score big score = queryWeight * fieldWeight 0.44194174 = 0.70710677 * 0.625 "value" : 0.44194174, "description" : "weight(title:big in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.44194174, "description" : "score(doc=0,freq=1.0), product of:",
  • 33. data : fieldWeight fieldWeight = tf * idf * fieldnorm { "value" : 0.625, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0", "details" : [ ] } ] }, { "value" : 1.0, "description" : "idf(docFreq=1, maxDocs=2)", "details" : [ ] }, { "value" : 0.625, "description" : "fieldNorm(doc=0)", "details" : [ ] }
  • 34. data : queryWeight queryWeight = idf(docFreq=1, maxDocs=2)“ * queryNorm { "value" : 0.70710677, "description" : "queryWeight, product of:", "details" : [ { "value" : 1.0, "description" : "idf(docFreq=1, maxDocs=2)", "details" : [ ] }, { "value" : 0.70710677, "description" : "queryNorm", "details" : [ ] } }
  • 35. data : score big score = queryWeight * fieldWeight 0.44194174 = 0.70710677 * 0.625 "value" : 0.44194174, "description" : "weight(title:data in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.44194174, "description" : "score(doc=0,freq=1.0), product of:"
  • 36. big 값만 가진 필드 계산
  • 37. Score 계산 산식 상세 스코어 계산 산식에 대한 상세 score(q,d) score(q,d) is the relevance score of document d for query q. queryNorm(q) queryNorm(q) is the query normalization factor queryNorm = 1 / sqrt(sumOfSquaredWeights) coord(q,d) coord(q,d) is the coordination factor ∑(t in q) The sum of the weights for each term t in the query q for document d. tf(t in d) tf(t in d) is the term frequency for term t in document d. tf = sqrt(termFreq) idf(t) idf(t) is the inverse document frequency for term t. idf = 1 + ln(maxDocs/(docFreq + 1)) t.getBoos t() t.getBoost() is the boost that has been applied to the query norm(t,d) norm(t,d) is the field-length norm, combined with the index-time fiel d-level boost, if any. norm = 1/sqrt(numFieldTerms)
  • 38. Title :big picture score big data score = big score + data score 0.883883 = 0.44194174+ 0.44194174 max_score" : 0.8838835, "hits" : [ { "_shard" : 3, "_node" : "LhufT5nGQPmrhEFEwV8-Cw", "_index" : "books", "_type" : "itbook", "_id" : "1", "_score" : 0.8838835, "_source" : { "title" : "big data", "author" : [ "hwang", "kang" ], "price" : 30000, "pages" : 300 }, "_explanation" : { "value" : 0.8838835, "description" : "sum of:"
  • 39. big : fieldWeight fieldWeight = tf * idf * fieldnorm { "value" : 0.8784157, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0", "details" : [ ] } ] }, { "value" : 1.4054651, "description" : "idf(docFreq=1, maxDocs=3)", "details" : [ ] }, { "value" : 0.625, "description" : "fieldNorm(doc=0)", "details" : [ ] } }
  • 40. big : queryWeight queryWeight = idf(docFreq=1, maxDocs=2)“ * queryNorm { { "value" : 0.5564505, "description" : "queryWeight, product of:", "details" : [ { "value" : 1.4054651, "description" : "idf(docFreq=1, maxDocs=3)", "details" : [ ] }, { "value" : 0.3959191, "description" : "queryNorm", "details" : [ ] } ] }
  • 41. big : score big score = queryWeight * fieldWeight 0.48879483 = 0.5564505 * 0.8784157 details" : [ { "value" : 0.48879483, "description" : "sum of:", "details" : [ { "value" : 0.48879483, "description" : "weight(title:big in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.48879483, "description" : "score(doc=0,freq=1.0), product of:",
  • 42. big : coord coord(1/2) { "value" : 0.5, "description" : "coord(1/2)", "details" : [ ] }
  • 43. big picture: score big score = big score * coord 0.24439742 = 0.48879483 * 0.5 "value" : 0.24439742, "description" : "product of:"
  • 46. 쿼리 검색 설명 Title 필드로 2가지 조건을 검색할 경우 Boost 계산이 2개이상이 있을 경우 계산됨
  • 47. Query 검색결과 big에 매칭되는 결과 조회 검색결과값 = 쿼리가중치 * 필드가중치 0.78567886 = 0.8944272 * 0.8784157 최종값 = 검색결과값/(1/쿼리갯수) 0.39283943 = 0.78567886*0.5
  • 48. 쿼리 weight 설명 TF, IDF, FLN에 대한 값을 표시 boost IDF Query Norm* * 0.8944272 = 2.0 * 1.4054651 * 0.31819615
  • 49. 필드 weight 설명 TF, IDF, FLN에 대한 값을 표시 TF IDF FLN * * 0.8784157 = 1.0 * 1.4054651 * 0.625