SlideShare a Scribd company logo
QER : Query Entity Recognition

Dhwaj Raj
Member
Web Intelligence and Semantics (WISE) group
InfoEdge India Ltd.
Named Entities and Recognition






Named entity recognition is a task that seeks to locate and classify
atomic elements in text into predefined categories.

Sample predefined categories: names of persons, organizations, real
estate projects, institutes, colleges, locations, durations and
quantities etc.

Word “Named” aims to restrict the task to only those entities for which
one or many proper names can be designated.
Entity recognition in query


Understanding query to the level that we can extract information from it in an
intelligent way and our search systems could answer questions with respect
to it.
Challenges






Entities may firstly be difficult to find, and once found, difficult to classify. For
instance, locations and builder names can be the similar.
which learning technique better applies.
how to balance the amount of free text in order to build a suitable training
corpus.



That the recogniser be efficient, and have high recall.



Entity resolution :
Vishal Sinha Associates is person name or company name or both
− dehradoon institute of technology delhi
− Uttarakhand residences noida
− delhi 99 residency bhopura
To build a system that can easily be used in another project as well.
−





Regular syncronization with domain data.
Advantage
* Identifying named entities in queries would help us to understand search intents better,
and therefore provide better search.

* Structured query enables the system to perform better search with structured
documents.

* In relevance search, a structured query can help in improving the ranking by treating
entity and context separately.

* Entity Recognition in query provides segmentation of longer queries.

* Entity Recognition in query provides entity roles taxonomy.
Applications


To implement filtered search for text query input.



In phrase based auto suggestor resolution.













In QnA to detect entities under discussion which are not explicitly defined. Thus each
QnA discussion can be associated to projects etc.
Tag contents and Listing all over the website.
Semantic analysis can be performed by using entity cooccurence relations to create
topic/tag tree.
To improve property posting experience of user. We can recommend / show preselected
the fields for which user is reluctant or lazy to choose from a drop down, during property
posting overlay by real time extracting the entities from property description.
To structurize the property description as well as to detect spamminess. We are defining
Spamminess in real estate domain not as PROFANITY (obscenity) but as a keyword
stuffing phenomena. Many brokers put all projects they deal in to come up in search
results but hamper the search relevance.
And many more .....
Approaches for QER
* String Alignments Matching
In this approach we perform simple dictionary matching. we have a
dictionary files which are simple lists of all know keywords of a category;
for example a file containing list of all course names and variants.
* Probabilistic Shallow Parsing using CRF
We apply machine learning by using probabilistic graphical model
following markov dependency. We predict the label of a word sequence
based on observation sequence and priori probabilities obtained by
training. Useful in predicting labels even for the unknown new entities.
* Hybrid
Approaches for QER : protein alignment matching
1. Remove low-complexity region or sequence repeats in the query sequence.
2. Make a k-word sequence list of the query sequence.
3. List the possible matching sequences and organize the remaining highscoring sequences into an efficient search tree.

4. Repeat step 3 for each k-word sequences in the query and Scan the database
sequences for exact matches with the remaining high-scoring words.
5. Extend the exact matches to high-scoring segment pair (HSP).
6. List all of the HSPs in the database whose score is high enough to be
considered and evaluate the significance of the HSP score. Make two or more
HSP regions into a longer alignment.
7. Provide classes to matched segments based on the master data set matched.
Use priority scores to resolve the calssification of overlapping matched segments.
Approaches for QER : Shallow parsing with
Conditional Random Fields
The NER engine was trained and tested on our own tags

Sample entities recognized using CRF in queries:
[btech] in [delhi]
[institutes pgdma] in [operations]
[mba] in [finance] full [time courses] in [delhi]
[part time mba] in [marketing]
[mba correspondence] courses in [banglore]
[mba] in [delhi]
Approaches for QER : Hybrid of matching
and machine learning
In current QER system we use this Hybrid approach of using
sequence alignment matching with conditional random fields.

Entities by matching are used as boosted weight features for
learning state probabilities.

Transition probabilities are learned from the observations.
Features of QER System
QER uses memory maps based indexing of sequences so average server processing time
for a query is 7 ms.



QER runs on apache tomcat so with mod_cache config we can make repeat queries parsed
in <1 ms.



QER uses state of the art protein sequence alignment algorithm (BLAST-A) to resolve
boundary of entities with is much better than prefix suffix of token mapping.



On known entities QER has F1 score of 99% for matching. (tested on new autosuggestor
phrases 99acres_QER#QERModificationsandanalysis:LOG)



No need to manually update training data. QER has synchronizations modules which can
sync all updates of project, locality etc from 99acres data.



No need to worry about pipeline management. Each module is configurable from
config.properties file.



QER provides XML, HTML and SOLRQUERY formats for quick integration with SOLR.


Got messed up data? QER tries to clean entity titles etc. (but only to some extent).

Any matching system tells the result that what entities matched. But QER also outputs the
text segments of query with a map of which candidate matched to which entity. This candidate
selection can be put to other utilities as well.





QER allows to configure whcih entities to be used as filter and hence should be removed
from keyword query, and which entities should not be removed.



Logically weighted synonyms

Results

Tested for manual annotations

* Trained for real estate domain :
Average F1 score for entity recognition in input phrases : 0.918221

* Trained for education listings domain :
Average F1 score for entity recognition in input phrases : 0.88649

Detailed results provided in the paper published

* F1 score=G.M(recall, precision)
=(2x recall x precision)/(recall+precision)
Future Directions and Applications
Extending QER to form a complete query
dynamics system which may include, but not
limited to:
•
•
•
•
•
•

Query hierarchical classification
Query Objectivity Detection
Query Intent direction
Result category prediction for a given query
Query expansion using sematic topics
And more..
Thank you


Questions?

More Related Content

What's hot

How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
Sease
 
2015-User Modeling of Skills and Expertise from Resumes-KMIS
2015-User Modeling of Skills and Expertise from Resumes-KMIS2015-User Modeling of Skills and Expertise from Resumes-KMIS
2015-User Modeling of Skills and Expertise from Resumes-KMIS
Hua Li, PhD
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Trey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
Trey Grainger
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
IOSR Journals
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
National Inistitute of Informatics (NII), Tokyo, Japann
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ijaia
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
Kumar Goud
 
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
Nikita Sharma
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
IJSRD
 
D017422528
D017422528D017422528
D017422528
IOSR Journals
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
fikirabc
 

What's hot (13)

How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
 
2015-User Modeling of Skills and Expertise from Resumes-KMIS
2015-User Modeling of Skills and Expertise from Resumes-KMIS2015-User Modeling of Skills and Expertise from Resumes-KMIS
2015-User Modeling of Skills and Expertise from Resumes-KMIS
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
 
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
White paper - Job skills extraction with LSTM and Word embeddings - Nikita Sh...
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
 
D017422528
D017422528D017422528
D017422528
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
 

Viewers also liked

DaCare Legal Search
DaCare Legal SearchDaCare Legal Search
DaCare Legal Search
lynnwang
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
Lifeng (Aaron) Han
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
Lars Juhl Jensen
 
Exploring Linked Data content through network analysis
Exploring Linked Data content through network analysisExploring Linked Data content through network analysis
Exploring Linked Data content through network analysis
Christophe Guéret
 
Automatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionAutomatic Term Ambiguity Detection
Automatic Term Ambiguity Detection
Yunyao Li
 
Linked Data: What’s the Story?
Linked Data: What’s the Story?Linked Data: What’s the Story?
Linked Data: What’s the Story?
WiLS
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
Timm Heuss
 
Entity Search Engine
Entity Search Engine Entity Search Engine
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
Guy De Pauw
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Olivier Grisel
 
Multlingual Linked Data Patterns
Multlingual Linked Data PatternsMultlingual Linked Data Patterns
Multlingual Linked Data Patterns
Jose Emilio Labra Gayo
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER)
Stephen Shellman
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
Arabic_NLP_ImamU2013
 
Text mining
Text miningText mining
Text mining
Lars Juhl Jensen
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
Richard Littauer
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
Dave Lewis
 
Discoverers of Surface Analysis
Discoverers of Surface AnalysisDiscoverers of Surface Analysis
Discoverers of Surface Analysis
Yamada Language Center
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
INRIA-OAK
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
EUCLID project
 

Viewers also liked (20)

DaCare Legal Search
DaCare Legal SearchDaCare Legal Search
DaCare Legal Search
 
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fiel...
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
 
Exploring Linked Data content through network analysis
Exploring Linked Data content through network analysisExploring Linked Data content through network analysis
Exploring Linked Data content through network analysis
 
Automatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionAutomatic Term Ambiguity Detection
Automatic Term Ambiguity Detection
 
Linked Data: What’s the Story?
Linked Data: What’s the Story?Linked Data: What’s the Story?
Linked Data: What’s the Story?
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Entity Search Engine
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Multlingual Linked Data Patterns
Multlingual Linked Data PatternsMultlingual Linked Data Patterns
Multlingual Linked Data Patterns
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER)
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Text mining
Text miningText mining
Text mining
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
Discoverers of Surface Analysis
Discoverers of Surface AnalysisDiscoverers of Surface Analysis
Discoverers of Surface Analysis
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 

Similar to QER : query entity recognition

The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
Ben DeMott
 
Lec 2
Lec 2Lec 2
Lec 2
alaa223
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniqueppt
Gyandeep Kansal
 
Final ppt
Final pptFinal ppt
Final ppt
Gyandeep Kansal
 
Wcc elise features
Wcc elise featuresWcc elise features
Wcc elise features
Ramez Al-Fayez
 
Focused Crawling System based on Improved LSI
Focused Crawling System based on Improved LSIFocused Crawling System based on Improved LSI
Focused Crawling System based on Improved LSI
International Journal of Science and Research (IJSR)
 
Search page-with-elasticsearch-and-dot-net
Search page-with-elasticsearch-and-dot-netSearch page-with-elasticsearch-and-dot-net
Search page-with-elasticsearch-and-dot-net
sonia merchant
 
Building A Search Page with Elasticsearch and .NET- II
Building A Search Page with Elasticsearch and .NET- IIBuilding A Search Page with Elasticsearch and .NET- II
Building A Search Page with Elasticsearch and .NET- II
Pooja Gaikwad
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
Michael King
 
Core_ElasticSearch_Outline
Core_ElasticSearch_OutlineCore_ElasticSearch_Outline
Core_ElasticSearch_Outline
Sebastian Xu
 
IJET-V3I2P2
IJET-V3I2P2IJET-V3I2P2
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
Alexander Decker
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Alexander Decker
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
Selecto
 
Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...
IJECEIAES
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
ijnlc
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
Peter Berger
 

Similar to QER : query entity recognition (20)

The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Lec 2
Lec 2Lec 2
Lec 2
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniqueppt
 
Final ppt
Final pptFinal ppt
Final ppt
 
Wcc elise features
Wcc elise featuresWcc elise features
Wcc elise features
 
Focused Crawling System based on Improved LSI
Focused Crawling System based on Improved LSIFocused Crawling System based on Improved LSI
Focused Crawling System based on Improved LSI
 
Search page-with-elasticsearch-and-dot-net
Search page-with-elasticsearch-and-dot-netSearch page-with-elasticsearch-and-dot-net
Search page-with-elasticsearch-and-dot-net
 
Building A Search Page with Elasticsearch and .NET- II
Building A Search Page with Elasticsearch and .NET- IIBuilding A Search Page with Elasticsearch and .NET- II
Building A Search Page with Elasticsearch and .NET- II
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Core_ElasticSearch_Outline
Core_ElasticSearch_OutlineCore_ElasticSearch_Outline
Core_ElasticSearch_Outline
 
IJET-V3I2P2
IJET-V3I2P2IJET-V3I2P2
IJET-V3I2P2
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
 
Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...Query expansion using novel use case scenario relationship for finding featur...
Query expansion using novel use case scenario relationship for finding featur...
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

QER : query entity recognition

  • 1. QER : Query Entity Recognition Dhwaj Raj Member Web Intelligence and Semantics (WISE) group InfoEdge India Ltd.
  • 2. Named Entities and Recognition    Named entity recognition is a task that seeks to locate and classify atomic elements in text into predefined categories. Sample predefined categories: names of persons, organizations, real estate projects, institutes, colleges, locations, durations and quantities etc. Word “Named” aims to restrict the task to only those entities for which one or many proper names can be designated.
  • 3. Entity recognition in query  Understanding query to the level that we can extract information from it in an intelligent way and our search systems could answer questions with respect to it.
  • 4. Challenges    Entities may firstly be difficult to find, and once found, difficult to classify. For instance, locations and builder names can be the similar. which learning technique better applies. how to balance the amount of free text in order to build a suitable training corpus.  That the recogniser be efficient, and have high recall.  Entity resolution : Vishal Sinha Associates is person name or company name or both − dehradoon institute of technology delhi − Uttarakhand residences noida − delhi 99 residency bhopura To build a system that can easily be used in another project as well. −   Regular syncronization with domain data.
  • 5. Advantage * Identifying named entities in queries would help us to understand search intents better, and therefore provide better search. * Structured query enables the system to perform better search with structured documents. * In relevance search, a structured query can help in improving the ranking by treating entity and context separately. * Entity Recognition in query provides segmentation of longer queries. * Entity Recognition in query provides entity roles taxonomy.
  • 6. Applications  To implement filtered search for text query input.  In phrase based auto suggestor resolution.       In QnA to detect entities under discussion which are not explicitly defined. Thus each QnA discussion can be associated to projects etc. Tag contents and Listing all over the website. Semantic analysis can be performed by using entity cooccurence relations to create topic/tag tree. To improve property posting experience of user. We can recommend / show preselected the fields for which user is reluctant or lazy to choose from a drop down, during property posting overlay by real time extracting the entities from property description. To structurize the property description as well as to detect spamminess. We are defining Spamminess in real estate domain not as PROFANITY (obscenity) but as a keyword stuffing phenomena. Many brokers put all projects they deal in to come up in search results but hamper the search relevance. And many more .....
  • 7. Approaches for QER * String Alignments Matching In this approach we perform simple dictionary matching. we have a dictionary files which are simple lists of all know keywords of a category; for example a file containing list of all course names and variants. * Probabilistic Shallow Parsing using CRF We apply machine learning by using probabilistic graphical model following markov dependency. We predict the label of a word sequence based on observation sequence and priori probabilities obtained by training. Useful in predicting labels even for the unknown new entities. * Hybrid
  • 8. Approaches for QER : protein alignment matching 1. Remove low-complexity region or sequence repeats in the query sequence. 2. Make a k-word sequence list of the query sequence. 3. List the possible matching sequences and organize the remaining highscoring sequences into an efficient search tree. 4. Repeat step 3 for each k-word sequences in the query and Scan the database sequences for exact matches with the remaining high-scoring words. 5. Extend the exact matches to high-scoring segment pair (HSP). 6. List all of the HSPs in the database whose score is high enough to be considered and evaluate the significance of the HSP score. Make two or more HSP regions into a longer alignment. 7. Provide classes to matched segments based on the master data set matched. Use priority scores to resolve the calssification of overlapping matched segments.
  • 9. Approaches for QER : Shallow parsing with Conditional Random Fields The NER engine was trained and tested on our own tags Sample entities recognized using CRF in queries: [btech] in [delhi] [institutes pgdma] in [operations] [mba] in [finance] full [time courses] in [delhi] [part time mba] in [marketing] [mba correspondence] courses in [banglore] [mba] in [delhi]
  • 10. Approaches for QER : Hybrid of matching and machine learning In current QER system we use this Hybrid approach of using sequence alignment matching with conditional random fields. Entities by matching are used as boosted weight features for learning state probabilities. Transition probabilities are learned from the observations.
  • 11. Features of QER System QER uses memory maps based indexing of sequences so average server processing time for a query is 7 ms.  QER runs on apache tomcat so with mod_cache config we can make repeat queries parsed in <1 ms.  QER uses state of the art protein sequence alignment algorithm (BLAST-A) to resolve boundary of entities with is much better than prefix suffix of token mapping.  On known entities QER has F1 score of 99% for matching. (tested on new autosuggestor phrases 99acres_QER#QERModificationsandanalysis:LOG)  No need to manually update training data. QER has synchronizations modules which can sync all updates of project, locality etc from 99acres data.  No need to worry about pipeline management. Each module is configurable from config.properties file.  QER provides XML, HTML and SOLRQUERY formats for quick integration with SOLR.  Got messed up data? QER tries to clean entity titles etc. (but only to some extent). Any matching system tells the result that what entities matched. But QER also outputs the text segments of query with a map of which candidate matched to which entity. This candidate selection can be put to other utilities as well.   QER allows to configure whcih entities to be used as filter and hence should be removed from keyword query, and which entities should not be removed.  Logically weighted synonyms 
  • 12. Results Tested for manual annotations * Trained for real estate domain : Average F1 score for entity recognition in input phrases : 0.918221 * Trained for education listings domain : Average F1 score for entity recognition in input phrases : 0.88649 Detailed results provided in the paper published * F1 score=G.M(recall, precision) =(2x recall x precision)/(recall+precision)
  • 13. Future Directions and Applications Extending QER to form a complete query dynamics system which may include, but not limited to: • • • • • • Query hierarchical classification Query Objectivity Detection Query Intent direction Result category prediction for a given query Query expansion using sematic topics And more..