SlideShare a Scribd company logo
1 of 26
Download to read offline
Machine Learning for Search
Criteria Prediction
Dhwaj Raj
Convention

Query
=
Search Critera
=
Set of search filters chosen by user
User searches : at our real estate
portal
69121
70000
60000

Average Searches in active
user session = 3.6

Users
Over
50000
1 week
40000

30963

30000
17332

20000

Long tail samples ...
9561

10000

5911

3819

2481

1595

1206

0
2

3

4

5

6

7

8

Number of searches in an active user session

9

10
Aim of Query/ Search Criteria
Recommendation
●

One of the major ways to improve user experience.

●

Minimize search actions while fulfulling the user intent.

●

●

Facilitate and Guide users through their informationseeking tasks.
Help them explore alternate ways to express their intent.
Introduction
●

●

●

We propose search log mining to capture collective
intelligence for providing query suggestions by predicting
user's future search queries.
When enough history is observed for a given query chain,
future queries can be accurately modeled.
Understanding what users are querying in the current
search sessions, and predict what they will search next
have many potential applications such as query
recommendation, web page re-ordering, advertisement
arrangement, and so on.
Suggested Applications
●

query recommendation
Suggested Applications
* Page Re-ordering
Suggested Applications
* Ad Placement
Suggested Applications
●

●

Use prediction from history chains for query based
personalization

Using feedback to search relevance.
Suggested Applications
●

●

AND MANY OTHER APPLICATIONS …..
Also, This technology of action chain based
predictions can be used for domains other than
search query as well. Such as listing
recommendation, page action prediction etc.
Search Query Chains
●

User1: A → B → C

●

User2: B → C → E

●

User3: A → B → C → E

●

User4: B → C → F

●

User5: A → E

●

Where A, B, C etc are search queries/criterias

●

UserNew1: B → C → ?

●

UserNew2: A → ?
Basis of Model
●

Following experiments were done:
1. Distribution of 2 action query chain does not
correlate with high order query chains. For e.g
distribution of E → F does not correlate well
with distribution of B → E → F, or C → E → F or
B→C→E→F
–

Observation: probability of step1 → step2 also depends
conditionally on the action history i.e. on past actions
used to reach step1.
Basis of Model
2. Search criteria is a set of filter entities:
A,B,C can be represented as (e1,e2,e3,e4,e5...)
Where e1,e2 etc are entity types for e.g. proptype, city, locality,
budget etc.
●

Experiments show that probability of a criteria state A,B or C
depends conditionally on set of entities present in the criteria.
For eg. Probability of
A(resi,sector105noida,underconstr) → B(resi,sector110noida,underconstr)

has a different probability distribution than
A(resi,sector105noida,readyPosess) → B(resi,sector110noida,readyPosess)
and
A(resi,sector105noida) → B(resi,sector110noida)
Basis of Model
3. Experiments were to done to observe correlation
among distribution of history chains.
●

Observation:
A→B→C→E
correlate with B → A → C → E but same history order chains
have higher correlation.

●

Conclusion: Order matters in history but unordered history
also correlate well,
so order should be given improtance but should not be sole
criteria for model.

●

Use bayesian independence but give reward to same order.
P( next | A → B → C → D) = k. P(next | B → A → C → D)
Markov Chains & decision process
At each time step, the process is in some state s,
and the decision maker may choose any action a
that is available in state s. The process responds
at the next time step by randomly moving into a
new state s', and giving the decision maker a
corresponding reward R_a(s,s').
Conditianal Probability computations to make
these chain graphs memoryless:
Constructing query-flow graphs
query flow graphs are
generally stored as a transition
matrix.
● Each cell is filled with
computed conditional
probability.
●

A

B

C

A

B

C

A

0

0

0.02

AA

0.1 0

0.3

B

0

0

0

AB

0.9 0

0

C

0.1 0.5 0

AC 0

0.2 0

D

0

0

BA

0.1 0

E

0

0.1 0

0

0

BC 0.6 0.3 0
Sub-Chains & Sub-Criteria-Chains
Help reduce sparseness.
●

training from log: resi_flat_sale_sector12noida → resi_flat_sale_sector22noida

●

Testing for : resi_flat_sale_sector12noida_10-25laks → ?

Sub Chains
●

Window shifting algorithm
A → B → C → D → E → F → G → H transform to : (A → B → C, B → C → D, A → B, B → C,
etc...)

●

Skip sequence
A → B → C → D → E → F → G → H transform to : ( A → B → D, A → B → E etc..)

Sub Criteria Chains
●

resi_flat_sale_sector12noida_10-25laks_unfurnished → resi_flat_sale_sector10noida_2530laks_unfurnished
Transforms to (resi_flat_sale_sector12noida → resi_flat_sale_sector10noida,
resi_flat_sale_sector12noida_10-25laks → resi_flat_sale_sector10noida_25-30laks,
resi_flat_sale_sector12noida_unfurnished → resi_flat_sale_sector10noida_unfurnished)

●

Only minor criteria items get replaced because major criteria is representative of an unique
search focus.
Query Log Mining
●

Visitor session info

●

Visitor search actions

●

●

Active Session : current session tracking keeps
session alive for days. Written optimized
procedure queries to change vam id for every
change in major criteria and for every idle time
of 1hr.
Clustering similar entity values to reduce
sparseness.
Assigning Weights
Storing query flow graphs
●

●

Transition matrices are very sparse and eat up unnecessary space for 0 prob pairs.
To overcome these we use state transition map:
State seq

prob

A->B->C

D

0.02

A->B->C

E

0.05

A->B

●

next

C

0.1

These query flow graph will be queried very
frequently.
Retrieval -1
●

●

●
●

Query flow graph can be seen as a key value
map where key is (stateseq) and val is
(next,probability) pair.
To utilize the system for various applications
including real time interaction, the retrieval part
should be really fast (>100ms).
Key set size > 30 million
Inverted index storages like lucene etc are not
used because insertion is very slow for given
key set size.
Retrieval - 2
●

●

●

●

●

Relational databases like mysql etc yielded slow insertion and
slower retrieval.
In memory structures like redis or hashmaps were not able to
occupy all keys.
Finally we used Cassandra NoSQL which is giving a very
good insertion and retrieval performance.
Advantage of cassandra : can be expanded and extended to
our need and it's cool to use.
Ref: using_cassandra
Prediction
Already prepared in
training phase

Learned
weights

Query
Flow
Transition
Map

Entity type prediction
Entity value Prediction

Predictor

State Fetcher

Input
{curr state,
past state seq}

Sub Chains Generator

Sub Criteria Chains generator

1. Predicting next entity type to be clicked. For e.g. next filter will
be on locality, or furnishing or budget etc.
2. Predicting next entity value to be clicked. For e.g. for each entity
type which value will be chosen next such as which locality or
which budget range.
What is not covered in this project
●

Some of the work on query recommendation focuses on measures of
query attribute similarity based on domain knowledgebase but
we have not included this in project scope.

●

For 99acres it can be seen as locality affinity etc.

●

Why not : maintainance issues.

●

●

Why not : we do not want too much dependency on domain specific
data rather it should stay as plug and play model working only on
query logs.
Why not: domain similarity unable to capture crowd's wisdom
–
–

–

–

Conditional nature of attribute dependency. For e.g user's choice for next locality also
depend on choice of budget, bedroom etc.
For eg. A user searching for sector xx is presented sector yy due to affinity but users
are aware of the current affairs and they do not want to search in yy because HC
has banned work there.
A user is searching for jaypee kosmos and is presented supertech upcountry(near)
but users had some knowledge of hidden attributes like FRR and green space so he
makes next search to jaypee aman(far).
Our Assumption: user crowd have more knowledge of current trends and ground
realities so entity relations can be better captured from their behaviors.
Results
●

●

●

●

In user behavior modelling, it is very difficult to say that model is x%
accurate for new user.
Even if we model accurately and predict accurately majority of users
may have different search focuses. So accuracy can be measured
truly by presenting the prediction to the user and then evaluating
that how many users we influenced.
So in user behavior modeling we can conclude with data that our
model fits well for already seen users/history.
In our case, a full fledged demo was created with 10 days of
search_criteria training.

●

On same validation set it provides accuracy of 79%.

●

On separate training and fresh test sets accuracy of 67%.

●

Demo has been hosted on local server for testing (link to be
distributed).
●

Thank you.

More Related Content

Similar to Query recommendation by search filters and criteria prediction

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentationAditya Gautam
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisnivatripathy93
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataIRJET Journal
 
Self-Tuning and Managing Services
Self-Tuning and Managing ServicesSelf-Tuning and Managing Services
Self-Tuning and Managing ServicesReza Rahimi
 
Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataMuhammad GulRaj
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningSouma Maiti
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Jill Lyons
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance VideoIRJET Journal
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksFranco Maria Nardini
 
A Framework for Performance Analysis of Computing Clouds
A Framework for Performance Analysis of Computing CloudsA Framework for Performance Analysis of Computing Clouds
A Framework for Performance Analysis of Computing Cloudsijsrd.com
 
Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised MLUmeshchandraYadav5
 
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)Harry Zhang
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentIJMER
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
 

Similar to Query recommendation by search filters and criteria prediction (20)

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
Dwd mdatamining intro-iep
Dwd mdatamining intro-iepDwd mdatamining intro-iep
Dwd mdatamining intro-iep
 
Human Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerDataHuman Activity Recognition Using AccelerometerData
Human Activity Recognition Using AccelerometerData
 
Self-Tuning and Managing Services
Self-Tuning and Managing ServicesSelf-Tuning and Managing Services
Self-Tuning and Managing Services
 
Performance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements dataPerformance analysis of KNN & K-Means using internet advertisements data
Performance analysis of KNN & K-Means using internet advertisements data
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search Tasks
 
A Framework for Performance Analysis of Computing Clouds
A Framework for Performance Analysis of Computing CloudsA Framework for Performance Analysis of Computing Clouds
A Framework for Performance Analysis of Computing Clouds
 
Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised ML
 
IEEE Projects 2014-2015
IEEE Projects 2014-2015IEEE Projects 2014-2015
IEEE Projects 2014-2015
 
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Query recommendation by search filters and criteria prediction

  • 1. Machine Learning for Search Criteria Prediction Dhwaj Raj
  • 2. Convention Query = Search Critera = Set of search filters chosen by user
  • 3. User searches : at our real estate portal 69121 70000 60000 Average Searches in active user session = 3.6 Users Over 50000 1 week 40000 30963 30000 17332 20000 Long tail samples ... 9561 10000 5911 3819 2481 1595 1206 0 2 3 4 5 6 7 8 Number of searches in an active user session 9 10
  • 4. Aim of Query/ Search Criteria Recommendation ● One of the major ways to improve user experience. ● Minimize search actions while fulfulling the user intent. ● ● Facilitate and Guide users through their informationseeking tasks. Help them explore alternate ways to express their intent.
  • 5. Introduction ● ● ● We propose search log mining to capture collective intelligence for providing query suggestions by predicting user's future search queries. When enough history is observed for a given query chain, future queries can be accurately modeled. Understanding what users are querying in the current search sessions, and predict what they will search next have many potential applications such as query recommendation, web page re-ordering, advertisement arrangement, and so on.
  • 9. Suggested Applications ● ● Use prediction from history chains for query based personalization Using feedback to search relevance.
  • 10. Suggested Applications ● ● AND MANY OTHER APPLICATIONS ….. Also, This technology of action chain based predictions can be used for domains other than search query as well. Such as listing recommendation, page action prediction etc.
  • 11. Search Query Chains ● User1: A → B → C ● User2: B → C → E ● User3: A → B → C → E ● User4: B → C → F ● User5: A → E ● Where A, B, C etc are search queries/criterias ● UserNew1: B → C → ? ● UserNew2: A → ?
  • 12. Basis of Model ● Following experiments were done: 1. Distribution of 2 action query chain does not correlate with high order query chains. For e.g distribution of E → F does not correlate well with distribution of B → E → F, or C → E → F or B→C→E→F – Observation: probability of step1 → step2 also depends conditionally on the action history i.e. on past actions used to reach step1.
  • 13. Basis of Model 2. Search criteria is a set of filter entities: A,B,C can be represented as (e1,e2,e3,e4,e5...) Where e1,e2 etc are entity types for e.g. proptype, city, locality, budget etc. ● Experiments show that probability of a criteria state A,B or C depends conditionally on set of entities present in the criteria. For eg. Probability of A(resi,sector105noida,underconstr) → B(resi,sector110noida,underconstr) has a different probability distribution than A(resi,sector105noida,readyPosess) → B(resi,sector110noida,readyPosess) and A(resi,sector105noida) → B(resi,sector110noida)
  • 14. Basis of Model 3. Experiments were to done to observe correlation among distribution of history chains. ● Observation: A→B→C→E correlate with B → A → C → E but same history order chains have higher correlation. ● Conclusion: Order matters in history but unordered history also correlate well, so order should be given improtance but should not be sole criteria for model. ● Use bayesian independence but give reward to same order. P( next | A → B → C → D) = k. P(next | B → A → C → D)
  • 15. Markov Chains & decision process At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state s', and giving the decision maker a corresponding reward R_a(s,s'). Conditianal Probability computations to make these chain graphs memoryless:
  • 16. Constructing query-flow graphs query flow graphs are generally stored as a transition matrix. ● Each cell is filled with computed conditional probability. ● A B C A B C A 0 0 0.02 AA 0.1 0 0.3 B 0 0 0 AB 0.9 0 0 C 0.1 0.5 0 AC 0 0.2 0 D 0 0 BA 0.1 0 E 0 0.1 0 0 0 BC 0.6 0.3 0
  • 17. Sub-Chains & Sub-Criteria-Chains Help reduce sparseness. ● training from log: resi_flat_sale_sector12noida → resi_flat_sale_sector22noida ● Testing for : resi_flat_sale_sector12noida_10-25laks → ? Sub Chains ● Window shifting algorithm A → B → C → D → E → F → G → H transform to : (A → B → C, B → C → D, A → B, B → C, etc...) ● Skip sequence A → B → C → D → E → F → G → H transform to : ( A → B → D, A → B → E etc..) Sub Criteria Chains ● resi_flat_sale_sector12noida_10-25laks_unfurnished → resi_flat_sale_sector10noida_2530laks_unfurnished Transforms to (resi_flat_sale_sector12noida → resi_flat_sale_sector10noida, resi_flat_sale_sector12noida_10-25laks → resi_flat_sale_sector10noida_25-30laks, resi_flat_sale_sector12noida_unfurnished → resi_flat_sale_sector10noida_unfurnished) ● Only minor criteria items get replaced because major criteria is representative of an unique search focus.
  • 18. Query Log Mining ● Visitor session info ● Visitor search actions ● ● Active Session : current session tracking keeps session alive for days. Written optimized procedure queries to change vam id for every change in major criteria and for every idle time of 1hr. Clustering similar entity values to reduce sparseness.
  • 20. Storing query flow graphs ● ● Transition matrices are very sparse and eat up unnecessary space for 0 prob pairs. To overcome these we use state transition map: State seq prob A->B->C D 0.02 A->B->C E 0.05 A->B ● next C 0.1 These query flow graph will be queried very frequently.
  • 21. Retrieval -1 ● ● ● ● Query flow graph can be seen as a key value map where key is (stateseq) and val is (next,probability) pair. To utilize the system for various applications including real time interaction, the retrieval part should be really fast (>100ms). Key set size > 30 million Inverted index storages like lucene etc are not used because insertion is very slow for given key set size.
  • 22. Retrieval - 2 ● ● ● ● ● Relational databases like mysql etc yielded slow insertion and slower retrieval. In memory structures like redis or hashmaps were not able to occupy all keys. Finally we used Cassandra NoSQL which is giving a very good insertion and retrieval performance. Advantage of cassandra : can be expanded and extended to our need and it's cool to use. Ref: using_cassandra
  • 23. Prediction Already prepared in training phase Learned weights Query Flow Transition Map Entity type prediction Entity value Prediction Predictor State Fetcher Input {curr state, past state seq} Sub Chains Generator Sub Criteria Chains generator 1. Predicting next entity type to be clicked. For e.g. next filter will be on locality, or furnishing or budget etc. 2. Predicting next entity value to be clicked. For e.g. for each entity type which value will be chosen next such as which locality or which budget range.
  • 24. What is not covered in this project ● Some of the work on query recommendation focuses on measures of query attribute similarity based on domain knowledgebase but we have not included this in project scope. ● For 99acres it can be seen as locality affinity etc. ● Why not : maintainance issues. ● ● Why not : we do not want too much dependency on domain specific data rather it should stay as plug and play model working only on query logs. Why not: domain similarity unable to capture crowd's wisdom – – – – Conditional nature of attribute dependency. For e.g user's choice for next locality also depend on choice of budget, bedroom etc. For eg. A user searching for sector xx is presented sector yy due to affinity but users are aware of the current affairs and they do not want to search in yy because HC has banned work there. A user is searching for jaypee kosmos and is presented supertech upcountry(near) but users had some knowledge of hidden attributes like FRR and green space so he makes next search to jaypee aman(far). Our Assumption: user crowd have more knowledge of current trends and ground realities so entity relations can be better captured from their behaviors.
  • 25. Results ● ● ● ● In user behavior modelling, it is very difficult to say that model is x% accurate for new user. Even if we model accurately and predict accurately majority of users may have different search focuses. So accuracy can be measured truly by presenting the prediction to the user and then evaluating that how many users we influenced. So in user behavior modeling we can conclude with data that our model fits well for already seen users/history. In our case, a full fledged demo was created with 10 days of search_criteria training. ● On same validation set it provides accuracy of 79%. ● On separate training and fresh test sets accuracy of 67%. ● Demo has been hosted on local server for testing (link to be distributed).