QER : query entity recognition

QER : Query Entity Recognition

Dhwaj Raj
Member
Web Intelligence and Semantics (WISE) group
InfoEdge India Ltd.

Named Entities and Recognition






Named entity recognition is a task that seeks to locate and classify
atomic elements in text into predefined categories.

Sample predefined categories: names of persons, organizations, real
estate projects, institutes, colleges, locations, durations and
quantities etc.

Word “Named” aims to restrict the task to only those entities for which
one or many proper names can be designated.

Entity recognition in query


Understanding query to the level that we can extract information from it in an
intelligent way and our search systems could answer questions with respect
to it.

Challenges






Entities may firstly be difficult to find, and once found, difficult to classify. For
instance, locations and builder names can be the similar.
which learning technique better applies.
how to balance the amount of free text in order to build a suitable training
corpus.



That the recogniser be efficient, and have high recall.



Entity resolution :
Vishal Sinha Associates is person name or company name or both
− dehradoon institute of technology delhi
− Uttarakhand residences noida
− delhi 99 residency bhopura
To build a system that can easily be used in another project as well.
−





Regular syncronization with domain data.

Advantage
* Identifying named entities in queries would help us to understand search intents better,
and therefore provide better search.

* Structured query enables the system to perform better search with structured
documents.

* In relevance search, a structured query can help in improving the ranking by treating
entity and context separately.

* Entity Recognition in query provides segmentation of longer queries.

* Entity Recognition in query provides entity roles taxonomy.

Applications


To implement filtered search for text query input.



In phrase based auto suggestor resolution.













In QnA to detect entities under discussion which are not explicitly defined. Thus each
QnA discussion can be associated to projects etc.
Tag contents and Listing all over the website.
Semantic analysis can be performed by using entity cooccurence relations to create
topic/tag tree.
To improve property posting experience of user. We can recommend / show preselected
the fields for which user is reluctant or lazy to choose from a drop down, during property
posting overlay by real time extracting the entities from property description.
To structurize the property description as well as to detect spamminess. We are defining
Spamminess in real estate domain not as PROFANITY (obscenity) but as a keyword
stuffing phenomena. Many brokers put all projects they deal in to come up in search
results but hamper the search relevance.
And many more .....

Approaches for QER
* String Alignments Matching
In this approach we perform simple dictionary matching. we have a
dictionary files which are simple lists of all know keywords of a category;
for example a file containing list of all course names and variants.
* Probabilistic Shallow Parsing using CRF
We apply machine learning by using probabilistic graphical model
following markov dependency. We predict the label of a word sequence
based on observation sequence and priori probabilities obtained by
training. Useful in predicting labels even for the unknown new entities.
* Hybrid

Approaches for QER : protein alignment matching
1. Remove low-complexity region or sequence repeats in the query sequence.
2. Make a k-word sequence list of the query sequence.
3. List the possible matching sequences and organize the remaining highscoring sequences into an efficient search tree.

4. Repeat step 3 for each k-word sequences in the query and Scan the database
sequences for exact matches with the remaining high-scoring words.
5. Extend the exact matches to high-scoring segment pair (HSP).
6. List all of the HSPs in the database whose score is high enough to be
considered and evaluate the significance of the HSP score. Make two or more
HSP regions into a longer alignment.
7. Provide classes to matched segments based on the master data set matched.
Use priority scores to resolve the calssification of overlapping matched segments.

Approaches for QER : Shallow parsing with
Conditional Random Fields
The NER engine was trained and tested on our own tags

Sample entities recognized using CRF in queries:
[btech] in [delhi]
[institutes pgdma] in [operations]
[mba] in [finance] full [time courses] in [delhi]
[part time mba] in [marketing]
[mba correspondence] courses in [banglore]
[mba] in [delhi]

Approaches for QER : Hybrid of matching
and machine learning
In current QER system we use this Hybrid approach of using
sequence alignment matching with conditional random fields.

Entities by matching are used as boosted weight features for
learning state probabilities.

Transition probabilities are learned from the observations.

Features of QER System
QER uses memory maps based indexing of sequences so average server processing time
for a query is 7 ms.



QER runs on apache tomcat so with mod_cache config we can make repeat queries parsed
in <1 ms.



QER uses state of the art protein sequence alignment algorithm (BLAST-A) to resolve
boundary of entities with is much better than prefix suffix of token mapping.



On known entities QER has F1 score of 99% for matching. (tested on new autosuggestor
phrases 99acres_QER#QERModificationsandanalysis:LOG)



No need to manually update training data. QER has synchronizations modules which can
sync all updates of project, locality etc from 99acres data.



No need to worry about pipeline management. Each module is configurable from
config.properties file.



QER provides XML, HTML and SOLRQUERY formats for quick integration with SOLR.


Got messed up data? QER tries to clean entity titles etc. (but only to some extent).

Any matching system tells the result that what entities matched. But QER also outputs the
text segments of query with a map of which candidate matched to which entity. This candidate
selection can be put to other utilities as well.





QER allows to configure whcih entities to be used as filter and hence should be removed
from keyword query, and which entities should not be removed.



Logically weighted synonyms


Results

Tested for manual annotations

* Trained for real estate domain :
Average F1 score for entity recognition in input phrases : 0.918221

* Trained for education listings domain :
Average F1 score for entity recognition in input phrases : 0.88649

Detailed results provided in the paper published

* F1 score=G.M(recall, precision)
=(2x recall x precision)/(recall+precision)

Future Directions and Applications
Extending QER to form a complete query
dynamics system which may include, but not
limited to:
•
•
•
•
•
•

Query hierarchical classification
Query Objectivity Detection
Query Intent direction
Result category prediction for a given query
Query expansion using sematic topics
And more..

QER : query entity recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (20)

Similar to QER : query entity recognition

Similar to QER : query entity recognition (20)

Recently uploaded

Recently uploaded (20)

QER : query entity recognition