SlideShare a Scribd company logo
User behaviour modeling for data
prefetching in web applications
Kacper Łukawski
The problem
Users tend to use the
application in a similar manner
over time
A majority of modern web
application uses multiple HTTP
calls to fully generate the
content
Each request increases total
latency perceived by a
particular user
Mobile applications
Network connection is not always good
Applications may become useless without an access to some
external resources
Would be great to have the data locally and synchronize whenever
possible
User behaviour modeling
Get only resources that the user requested for
Fetch everything what available
Try to predict what might be useful in the nearest
future
Content based methods
Statistical modeling
Predicting user's interests
Content based methods
Statistical modeling
Predicting user's interests
Content based methods
Statistical modeling
Predicting user's interests
NLP related technique
Performs statistical analysis of the language
As a base unit of modeling it uses fixed length
subsequences
Builds statistics of the collocations
N-gram statistical model
N-gram – an example
„The man who does not read has no advantage
over the man who cannot read”
The man 2
Has no
Man who 2
No advantage
Who does Advantage over
Does not Over the
Not read Who cannot
Read has Cannot read
Statistics are built in the similar manner like it is
done in standard n-gram architecture
But this time, n is a minimal length of a sequence,
not a fixed length
Such strategy usually leads to more accurate
predictions
N-gram+
N-gram+ – an example
„The man who does not read has no advantage
over the man who cannot read”
The man who does not
read has no advantage
over the man who cannot
read
The man Has no
The man who does not
read cannot
Man who No advantage
Man who does not read
has no advantage over
the man who cannot read
Who does Advantage over
... Does not Over the
... Not read Who cannot
... Read has Cannot read
A collection of Apache HTTP server logs
Delivers the following details of the requests:

Host

Timestamp

Request (URL and method)

HTTP reply code

Number of bytes in the response
NASA dataset
NASA dataset
NASA dataset - results
N-gram ~50% accuracy
N-gram+ ~60% accuracy
Typical web application are built on top of the HTTP
protocol
HTTP delivers 9 methods – each one has a different
meaning (GET, HEAD, PUT, POST, DELETE,
OPTIONS, TRACE, CONNECT, PATCH)
A proposal of extension
GET /order
GET /product/{product_id}
A proposal of extension – an example
GET /order [
{'id': 1, 'products: [{'id': 100}, {'id': 200}]},
{'id': 2, 'products: [{'id': 300}]}
]
GET /product/100 {
'id': 100,
'name': 'lorem'
}
GET /product/200 {
'id': 200,
'name': 'ipsum'
}
GET /product/300 {
'id': 300,
'name': 'dolor'
}
Typical API endpoint:
GET /resource/{id}/?foo={foo}&bar={bar}
Between two HTTP actions from one session:
➔
No relation at all
➔
Request or response tokens of second action are taken from
the first one
➔
Tokens are filled using some external knowledge
➔
Value of the token in the second action might be calculated
from the values of the first one
A proposal of extension - relations
Typical API endpoint:
GET /resource/{id}/?foo={foo}&bar={bar}
Between two HTTP actions from one session:
➔
No relation at all
➔
Request or response tokens of second action are taken from
the first one
➔
Tokens are filled using some external knowledge
➔
Value of the token in the second action might be calculated
from the values of the first one
A proposal of extension - relations
Let's treat each HTTP request and response as a
single action performed by the user
Try to find the relations between the actions that
often take place in the similar order
Use only the actions that do not change anything
(GET, HEAD)
A proposal of extension - contd
Assign the HTTP actions into the HTTP endpoints
that were used to process them
Tokenize each request and response
Using some n-gram-like architecture, try to find
the relations within the subsequences
(request/response request)→
Collect the statistics of token values in the requests
A proposal of extension – an algorithm
In the prediction phase try to find the most
probable endpoint that will be used in the next step
Having the endpoint, fill the tokens using relations
to previous actions
If not all the tokens can be filled, use the statistics
of the values
Perform the action(s) using predicted values of
tokens
Send aggregated responses at once
A proposal of extension – an algorithm
A. Georgakis, H. Li, “User behavior modeling and content
based speculative web page prefetching”, Data & Knowledge
Engineering 59 (2006) 770–788
M. Narvekar, S. S. Banu, “Predicting User’s Web Navigation
Behavior Using Hybrid Approach”, Procedia Computer
Science 45 ( 2015 ) 3 – 12
http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
https://en.wikipedia.org/wiki/N-gram
References

More Related Content

What's hot

professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
Kumar Goud
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
9th may net sci presentation (1)
9th may net sci presentation (1)9th may net sci presentation (1)
9th may net sci presentation (1)
Rajath Mahesh
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
John T. Kane
 
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
Joe Hall
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
MatthewOverstreet2
 
Semantic Search with Topic Maps
Semantic Search with Topic MapsSemantic Search with Topic Maps
Semantic Search with Topic Maps
Lars Marius Garshol
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
Ruben Verborgh
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
Seth Grimes
 
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Linking Content Information with Bayesian Personalized Ranking via Multiple C...Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Ladislav Peska
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
Rahul Thankachan
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
Vipul Munot
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
Salesforce Engineering
 
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
jtleek
 
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
John T. Kane
 

What's hot (16)

professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
9th may net sci presentation (1)
9th may net sci presentation (1)9th may net sci presentation (1)
9th may net sci presentation (1)
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
 
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
Identifying Patterns for Fun, Profit, and Rankings - Search Exchange 2013
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Semantic Search with Topic Maps
Semantic Search with Topic MapsSemantic Search with Topic Maps
Semantic Search with Topic Maps
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Linking Content Information with Bayesian Personalized Ranking via Multiple C...Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...The Largest Data Science Program in the World: The Johns Hopkins Data Science...
The Largest Data Science Program in the World: The Johns Hopkins Data Science...
 
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
Search Product Manager: Software PM vs. Enterprise PM or What does that * PM do?
 

Similar to User behaviour modeling for data prefetching in web applications

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS
 
Software performance testing_overview
Software performance testing_overviewSoftware performance testing_overview
Software performance testing_overview
Rohan Bhattarai
 
In3415791583
In3415791583In3415791583
In3415791583
IJERA Editor
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
Marianne Sweeny
 
HTTP Basics Demo
HTTP Basics DemoHTTP Basics Demo
HTTP Basics Demo
InMobi Technology
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011
Sematext Group, Inc.
 
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
Marco Balduzzi
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
20211a05p7
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first course
Vlad Posea
 
Switch to Backend 2023
Switch to Backend 2023Switch to Backend 2023
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
Nadia Boumaza
 
Using Implicit Preference Relations to Improve Content-based Recommendations,...
Using Implicit Preference Relations to Improve Content-based Recommendations,...Using Implicit Preference Relations to Improve Content-based Recommendations,...
Using Implicit Preference Relations to Improve Content-based Recommendations,...
Ladislav Peska
 
Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)
Rafael Dohms
 
Restful web services
Restful web servicesRestful web services
Restful web services
MD Sayem Ahmed
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
NIKHIL NAIR
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigm
Migrant Systems
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
IJwest
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
Ricky Bilakhia
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
MostafaAliAbbas
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 

Similar to User behaviour modeling for data prefetching in web applications (20)

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
Software performance testing_overview
Software performance testing_overviewSoftware performance testing_overview
Software performance testing_overview
 
In3415791583
In3415791583In3415791583
In3415791583
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
HTTP Basics Demo
HTTP Basics DemoHTTP Basics Demo
HTTP Basics Demo
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011
 
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
HTTP Parameter Pollution Vulnerabilities in Web Applications (Black Hat EU 2011)
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first course
 
Switch to Backend 2023
Switch to Backend 2023Switch to Backend 2023
Switch to Backend 2023
 
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
 
Using Implicit Preference Relations to Improve Content-based Recommendations,...
Using Implicit Preference Relations to Improve Content-based Recommendations,...Using Implicit Preference Relations to Improve Content-based Recommendations,...
Using Implicit Preference Relations to Improve Content-based Recommendations,...
 
Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)Application Metrics (with Prometheus examples)
Application Metrics (with Prometheus examples)
 
Restful web services
Restful web servicesRestful web services
Restful web services
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
User friendly pattern search paradigm
User friendly pattern search paradigmUser friendly pattern search paradigm
User friendly pattern search paradigm
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

User behaviour modeling for data prefetching in web applications

  • 1. User behaviour modeling for data prefetching in web applications Kacper Łukawski
  • 2. The problem Users tend to use the application in a similar manner over time A majority of modern web application uses multiple HTTP calls to fully generate the content Each request increases total latency perceived by a particular user
  • 3. Mobile applications Network connection is not always good Applications may become useless without an access to some external resources Would be great to have the data locally and synchronize whenever possible
  • 4. User behaviour modeling Get only resources that the user requested for Fetch everything what available Try to predict what might be useful in the nearest future
  • 5. Content based methods Statistical modeling Predicting user's interests
  • 6. Content based methods Statistical modeling Predicting user's interests
  • 7. Content based methods Statistical modeling Predicting user's interests
  • 8. NLP related technique Performs statistical analysis of the language As a base unit of modeling it uses fixed length subsequences Builds statistics of the collocations N-gram statistical model
  • 9. N-gram – an example „The man who does not read has no advantage over the man who cannot read” The man 2 Has no Man who 2 No advantage Who does Advantage over Does not Over the Not read Who cannot Read has Cannot read
  • 10. Statistics are built in the similar manner like it is done in standard n-gram architecture But this time, n is a minimal length of a sequence, not a fixed length Such strategy usually leads to more accurate predictions N-gram+
  • 11. N-gram+ – an example „The man who does not read has no advantage over the man who cannot read” The man who does not read has no advantage over the man who cannot read The man Has no The man who does not read cannot Man who No advantage Man who does not read has no advantage over the man who cannot read Who does Advantage over ... Does not Over the ... Not read Who cannot ... Read has Cannot read
  • 12. A collection of Apache HTTP server logs Delivers the following details of the requests:  Host  Timestamp  Request (URL and method)  HTTP reply code  Number of bytes in the response NASA dataset
  • 14. NASA dataset - results N-gram ~50% accuracy N-gram+ ~60% accuracy
  • 15. Typical web application are built on top of the HTTP protocol HTTP delivers 9 methods – each one has a different meaning (GET, HEAD, PUT, POST, DELETE, OPTIONS, TRACE, CONNECT, PATCH) A proposal of extension
  • 16. GET /order GET /product/{product_id} A proposal of extension – an example GET /order [ {'id': 1, 'products: [{'id': 100}, {'id': 200}]}, {'id': 2, 'products: [{'id': 300}]} ] GET /product/100 { 'id': 100, 'name': 'lorem' } GET /product/200 { 'id': 200, 'name': 'ipsum' } GET /product/300 { 'id': 300, 'name': 'dolor' }
  • 17. Typical API endpoint: GET /resource/{id}/?foo={foo}&bar={bar} Between two HTTP actions from one session: ➔ No relation at all ➔ Request or response tokens of second action are taken from the first one ➔ Tokens are filled using some external knowledge ➔ Value of the token in the second action might be calculated from the values of the first one A proposal of extension - relations
  • 18. Typical API endpoint: GET /resource/{id}/?foo={foo}&bar={bar} Between two HTTP actions from one session: ➔ No relation at all ➔ Request or response tokens of second action are taken from the first one ➔ Tokens are filled using some external knowledge ➔ Value of the token in the second action might be calculated from the values of the first one A proposal of extension - relations
  • 19. Let's treat each HTTP request and response as a single action performed by the user Try to find the relations between the actions that often take place in the similar order Use only the actions that do not change anything (GET, HEAD) A proposal of extension - contd
  • 20. Assign the HTTP actions into the HTTP endpoints that were used to process them Tokenize each request and response Using some n-gram-like architecture, try to find the relations within the subsequences (request/response request)→ Collect the statistics of token values in the requests A proposal of extension – an algorithm
  • 21. In the prediction phase try to find the most probable endpoint that will be used in the next step Having the endpoint, fill the tokens using relations to previous actions If not all the tokens can be filled, use the statistics of the values Perform the action(s) using predicted values of tokens Send aggregated responses at once A proposal of extension – an algorithm
  • 22. A. Georgakis, H. Li, “User behavior modeling and content based speculative web page prefetching”, Data & Knowledge Engineering 59 (2006) 770–788 M. Narvekar, S. S. Banu, “Predicting User’s Web Navigation Behavior Using Hybrid Approach”, Procedia Computer Science 45 ( 2015 ) 3 – 12 http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html https://en.wikipedia.org/wiki/N-gram References