SlideShare a Scribd company logo
Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
Silvio Pavanetto and Marco Brambilla
Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task
Silvio Pavanetto and Marco Brambilla
Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game
Silvio Pavanetto and Marco Brambilla
Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:
Silvio Pavanetto and Marco Brambilla
Experiments
Evaluation – Final Comparison
Weblog generation performance comparison
Silvio Pavanetto and Marco Brambilla
Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric
Silvio Pavanetto and Marco Brambilla
Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites

More Related Content

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Speed_Perception_Phase1
Speed_Perception_Phase1Speed_Perception_Phase1
Speed_Perception_Phase1
pahammad
 
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
OUTFITTERY
 
GOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningGOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine Learning
OUTFITTERY
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
Tao Xie
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
OUTFITTERY
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
Hal Rottenberg
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEO
Will Critchlow
 
JDO 2019: Data Science for Developers - Matthew Renze
JDO 2019: Data Science for Developers -  Matthew RenzeJDO 2019: Data Science for Developers -  Matthew Renze
JDO 2019: Data Science for Developers - Matthew Renze
PROIDEA
 
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Akamai Technologies
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
Vincent Terrasi
 
How to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance ToolbeltHow to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance Toolbelt
Brett Tramposh
 
Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score. Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score.
Paul Puget
 
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Sauce Labs
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Sease
 
Quoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine LearningQuoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine Learning
AI Frontiers
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
btNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AIbtNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AI
APNIC
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
caa28steve
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
Phillip Law
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
Phillip Law
 

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

Speed_Perception_Phase1
Speed_Perception_Phase1Speed_Perception_Phase1
Speed_Perception_Phase1
 
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
 
GOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningGOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine Learning
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEO
 
JDO 2019: Data Science for Developers - Matthew Renze
JDO 2019: Data Science for Developers -  Matthew RenzeJDO 2019: Data Science for Developers -  Matthew Renze
JDO 2019: Data Science for Developers - Matthew Renze
 
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
How to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance ToolbeltHow to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance Toolbelt
 
Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score. Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score.
 
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
Quoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine LearningQuoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine Learning
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
btNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AIbtNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AI
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
 

More from Marco Brambilla

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
Marco Brambilla
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Marco Brambilla
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Marco Brambilla
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
Marco Brambilla
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Marco Brambilla
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
Marco Brambilla
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
Marco Brambilla
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Marco Brambilla
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Marco Brambilla
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
Marco Brambilla
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
Marco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
Marco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
Marco Brambilla
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Marco Brambilla
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Marco Brambilla
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
Marco Brambilla
 

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
 

Recently uploaded

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 

Recently uploaded (20)

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

  • 1. Industrial and Information Engineering Generation of Realistic Navigation Paths for Web Site Testing using Recurrent Neural Networks and Generative Adversarial Neural Networks Silvio Pavanetto and Marco Brambilla Semantic Web and Linked Open Data Helsinki, Finland, Online on 9 – 12 June 2020
  • 2. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation? 1. Improve products even before the release 2. Generate open high-quality data for research 3. Related work with no focus on high-quality weblog generation 3.1 Only few open source libraries
  • 3. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation?
  • 4. Silvio Pavanetto and Marco Brambilla Problem Definition Challenges to be Faced 1. Understand if deep learning algorithms can generate better weblogs data than statistical methods 2. Understand what better weblog means 3. Among the various deep learning techniques, apply GAN (Generative Adversarial Network) to a new task
  • 5. Silvio Pavanetto and Marco Brambilla Problem Definition Roadmap for solving the problem Pre-process a publicly available weblog Develop statistical algorithm Develop recurrent neural network Develop GAN Evaluate the quality of the generated data
  • 6. Silvio Pavanetto and Marco Brambilla Proposed Approach Pre-processing algorithm Cleaning • Remove entries having response code other than 200 • Remove activities coming from bots • Remove no HTML pages • List of possible entry points • Navigation pattern using data mining (Apriori) • Generation of datasets that will be used by the other algorithms Knowledge extraction
  • 7. Silvio Pavanetto and Marco Brambilla Proposed Approach Deep Learning - RNN Why Recurrent Neural Network? • Well suited for processing sequential data
  • 8. Silvio Pavanetto and Marco Brambilla Proposed Approach Generative Adversarial Network • New type of neural network (first in 2014) with incredible generation capabilities • Almost used only in computer vision Key concept: Put two neural networks one against the other in a two-player game
  • 9. Silvio Pavanetto and Marco Brambilla Proposed Approach GAN Implementation – Possible Solution GAN is designed for generating continuous data Possible solution: • Generative model treated as an agent of reinforcement learning (RL) • The state is composed by the generated URLs so far, and the action is the next URL to be generated Reward: The discriminator produces a probability for the sequence of being real
  • 10. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good Evaluation Metric: BLEU BLEU, or Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations, or also, is an algorithm for evaluating the quality of text which has been machine-translated, from one natural language to another.
  • 11. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good BLEU is not enough. Human Evaluation! • 50 real sequences and 50 generated by the algorithms mixed • 6 judges are invited to check the 100 sequences • +1 for the algorithm if the judge is fooled • +0 point if the judge discovers that the sequence is not real • Scores are averaged among all the judges Evaluation game:
  • 12. Silvio Pavanetto and Marco Brambilla Experiments Evaluation – Final Comparison Weblog generation performance comparison
  • 13. Silvio Pavanetto and Marco Brambilla Conclusions We proposed a step forward towards automatic production of high- quality weblog using deep learning techniques, such as recurrent neural network and generative adversarial neural networks. Deep learning methods are suitable for weblog generation: • The GAN is the best algorithm: it outperforms the baseline by: • 0.2116 with the Human metric • 0.1432 with the BLEU metric
  • 14. Silvio Pavanetto and Marco Brambilla Future Work Integration with Model-Driven approaches useful for visualizing statistics about weblogs in a graphical way Addition of more variables in the training of the network that could improve the quality of the generated weblog Evaluation with other weblogs, belonging to different websites

Editor's Notes

  1. (like .png, .gif or other file types loaded inside a web page) (this task and its related issues will be discussed later)
  2. RNN: Artificial neural network (ANN) where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A loop allows information to be passed from one step of the network to the next. These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 
  3. Consider the sequence generation procedure as a sequential decision-making Process.
  4. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.
  5. Humans are good in evaluting this type of data since a weblog is a composition of navigation sequence and every sequence is something that is decided and created by a human. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.