SlideShare a Scribd company logo
1 of 24
Screen2Vec: Semantic Embedding of GUI
Screens and GUI Components
Toby Li, Lindsay Popowski, Tom M Mitchell, Brad A. Myers
2021 CHI Conference on Human Factors in Computing Systems
Background
• Existing approaches of representing GUI screens are limited
 Capturing only text on the screen
• Missing information encoded in the layout and design pattern
 Focusing on the visual design patterns and GUI layouts
• Not capturing the content in the GUI
• Prior approaches use supervised learning with large datasets for specific task
objectives
 Requiring labeling efforts
 Inapplicable in different downstream tasks
1
Semantic representations of GUI screens and components
Contribution
• Presenting a self-supervised technique, not requiring human-labeled data
• Generating more comprehensive semantic embeddings of GUI screens and
components using
 Textual content
 Visual design
 Layout patterns
 App meta-data
• Training an open-sourced GUI embedding model using Screen2Vec with RICO
dataset
• Providing sample downstream tasks such as
 Nearest neighbor retrieval
 Composability-based retrieval
 Representing mobile tasks
2
Architecture of Screen2Vec
3
GUI Component level
GUI Screen Level
• Two-level architecture
Architecture of Screen2Vec
• Input
 768-dimensional embedding vector of the text label of the GUI component
• Encoded using a pre-trained Sentence-BERT
 6-dimensional class embedding vector
• Representing the class type of the GUI component
• Optimizing weights in the class embeddings and weights in the linear layer (text + class)
• Output
 768-dimensional embedding vector
4
GUI Component Level
Architecture of Screen2Vec
1) Collection of GUI component embedding vector
 Combined into a 768-dimensional vector using RNN
2) 64-dimensional layout embedding vector
 Encoding the screen’s visual layout
3) 768-dimensional embedding vector of the textual App Store description
 Encoded with a pre-trained Sentence-BERT model
• GUI(1) and layout(2) vectors are combined using a linear layer  768-dimensional
embedding vector
• After training, description(3) vector is concatenated  1536-dimensional embedding
vector
• Weights of RNN and weights of the linear layer trained on a Continuous Bag of Word
prediction 5
GUI Screen Level
Dataset
• RICO Dataset
 Containing interaction traces on 66,261 unique GUI screens
 From 9,384 free Android apps
• Specifics
 Each dataset with a screenshot image
 Screen’s “view hierarchy” (e.g., DOM tree in HTML) in a JSON file
• Each node including
• Class type
• Textual content
• Location as the bounding box on the screen
• Properties such as whether it is clickable, focused, or scrollable
 Each interaction trace represented as a sequence of GUI screens
• Which location is clicked or swiped
6
Implementation Details
• Encoding 26 class categories into a vector space
• Mapping each of the categories into a continuous 6-dimensional vector
• Optimizing embedding vector value by training GUI component prediction task
 Categories semantically similar, close in the vector space
7
GUI Class Type Embeddings
Implementation Details
• Defining the context of a component as its 16 nearest components
• Measures of screen distance for determining the context
 Euclidean : straight-line minimal distance on the screen
• In pixel
 Hierarchical : distance between 2 GUI components on the hierarchical view tree
• Parent and children : 1
8
GUI Component Context
Implementation Details
• Combining multiple vectors into a lower-dimension vector
• GUI component level
 Concatenating 768-dimension with 6-dimension
 Shrinking down to 768-dimension
 Creating 774 x 768 weights
• GUI screen level
 Combining 768-dimension and 64-dimension
 Producing 768-dimension for screen content and layout
9
Linear Layer
Implementation Details
• Use a pre-trained Sentence-BERT language model
 Using SNLI and Multi-Genre NLI datasets with mean-pooling
• Encoding the text label of description to 768-dimensional vectors
• Deriving semantically similar sentences and phrases
10
Text Embeddings
Implementation Details
• Extracting the layout from a screenshot
• Differentiating between text and non-text GUI components
• Using autoencoder to encode each image into 64-dimensional embedding vector
• Encoder’s input dimension : 11,200
• Two hidden layers of 2,048 and 256
• Applying RLU to eliminate negative input
• Loss determined by MSE
11
Layout Embeddings
Implementation Details
• Combining embedding vectors of multiple GUI components
• GUI components embeddings fed into the RNN
 In the pre-order traversal order of hierarchy tree
• Starting with hidden state of zero, fed into a linear layer with 𝑛 − 1 𝑡ℎ output
12
GUI Embedding Combining Layer
Training Configuration
• Training: 90% of the data; validation: 10%
• Cross entropy loss function with Adam optimizer
• Learning rate: 0.001; batch size: 256
• GUI component model: 120 epochs; GUI screen model: 80-120 epochs
• Total loss
 Component
• Total Loss = Loss(text prediction) + Loss(class type prediction)
 Screen
• Negative sampling
• Prediction compared to the correct screen and a sample of negative data
• Random sampling of other screens with size 128 on the same app
• To differentiate different screens on the same app
13
Baselines
• Text Embedding Only (similar textual context)
 Screen embedding method used in SOVITE
 Computed by averaging the text embedding vectors for all the text in the screen
• Layout Embedding Only (similar layout)
 Screen embedding method used in the original RICO paper
 Computed by the layout autoencoder to represent the screen
• Visual Embedding Only (similar visual)
 Direct screen shot of image instead of layout
 Inspired by VASTA, Sikuli, and HILC
14
Results
• Predicting each GUI screen in all the GUI interaction traces in the RICO dataset using its
context
 3 versions to compare
• EUCLIDEAN with locations of GUI components and the screen layouts
• HIERARCHICAL with above spatial info
• EUCLIDEAN without spatial info
15
Sample Downstream Tasks
• The main purpose is to produce distributed vector representations that encode useful
semantic, layout, design properties
• Compare similarity between the nearest neighbor results by different models
Methods
• Select 50 screens from apps and app domains
• Retrieve top-5 most similar screens using each of 3 models
• 79 Mechanical Turk workers participated
• Each worker saw top-5 most similar screens of 5 source screens produced by 3 models
• Questionnaires include followings
 (1) App similarity (2) Screen type similarity (3) Content similarity
16
Nearest Neighbors
Sample Downstream Tasks
Results
• The differences between the mean ratings of the Screen2Vec model and both TextOnly
and LayoutOnly model are significant (non-parametric Mann-Whitney U test)
• Retrieve top-5 most similar screens using each of 3 models
17
Nearest Neighbors
Sample Downstream Tasks
Observation
• Screen2Vec generate more comprehensive
representations
 “Request ride” in Lyft
• “Get direction” in Uber Driver
• “Select navigation type” in Waze app
• “Request ride” in Free Now
 MapView taking majority
 All feature a menu/information card at the
bottom 1/3 – 1/4
• TextOnly generated results are semantically
similar to “payment”
• LayoutOnly generated results show lower score
in the content and app-context similarity
18
Nearest Neighbors
Sample Downstream Tasks
Word2Vec
• “Man is to woman as brother is to sister”
• (brother - man + woman) results in an
embedding vector representing sister
Screen2Vec
• Marriott app ’s “hotel booking” screen +
(Cheapoair app’s “search result” screen
− Cheapoair app’s “hotel booking”
screen))
• The top result is the “search result”
screen in the Marriott app
19
Embedding Composability
Sample Downstream Tasks
• Preliminary evaluation on the effectiveness of embedding mobile tasks as
sequences of Screen2Vec screen embedding vectors
• Recording scripts of completing 10 common smartphone tasks
• Representing each task as the average of Screen2Vec vectors
• Querying for the nearest neighbor within 20 task variations and get 18/20
accuracy
 TextOnly : 14/20 accuracy
20
Screen Embedding Sequences for Representing Mobile Tasks
Potential Application
• Designers query for example designs that display similar content or screens in
apps of a similar domain
• Composability helps to find a specific page for the app
 Suppose a designer searches for checkout page for app A
 A’s order page + (App B’s checkout page – App B’s order page
• LayoutGAN can generate realistic GUI layouts based on user-specified
constraints
 Applying Screen2Vec to incorporate the semantics of GUIs and the context
of user interaction
21
Limitation
• Only trained and test on Android app GUIs
• RICO dataset
 contains interaction traces within single apps  need to generalize multiple app
 Does not contain paid apps
• Screen2Vec does not encode the semantics of graphic icons that have no textual
information
22
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

More Related Content

Similar to Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

Curriculum vitae of Varun Rawat. Resume is for GIS candidature.
Curriculum vitae of Varun Rawat. Resume is for GIS candidature.Curriculum vitae of Varun Rawat. Resume is for GIS candidature.
Curriculum vitae of Varun Rawat. Resume is for GIS candidature.VarunRawat41
 
Lecture14 abap on line
Lecture14 abap on lineLecture14 abap on line
Lecture14 abap on lineMilind Patil
 
l1-reactnativeintroduction-160816150540.pdf
l1-reactnativeintroduction-160816150540.pdfl1-reactnativeintroduction-160816150540.pdf
l1-reactnativeintroduction-160816150540.pdfHương Trà Pé Xjnk
 
[Seminar] 200605 seunghyeong choe
[Seminar] 200605 seunghyeong choe[Seminar] 200605 seunghyeong choe
[Seminar] 200605 seunghyeong choeivaderivader
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 
React Native Introduction: Making Real iOS and Android Mobile App By JavaScript
React Native Introduction: Making Real iOS and Android Mobile App By JavaScriptReact Native Introduction: Making Real iOS and Android Mobile App By JavaScript
React Native Introduction: Making Real iOS and Android Mobile App By JavaScriptKobkrit Viriyayudhakorn
 
Best Institute for IBM Web Experience Factory
Best Institute for IBM Web Experience FactoryBest Institute for IBM Web Experience Factory
Best Institute for IBM Web Experience FactoryVirtual Nuggets
 
03.Controls in Windows Phone
03.Controls in Windows Phone03.Controls in Windows Phone
03.Controls in Windows PhoneNguyen Tuan
 
MicroStation Product Datasheet
MicroStation Product DatasheetMicroStation Product Datasheet
MicroStation Product DatasheetAllwyn Antony
 
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdf
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdfSpecialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdf
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdfRichitar1
 
React Native: Introduction
React Native: IntroductionReact Native: Introduction
React Native: IntroductionInnerFood
 
SadikulIslamDotNetResume
SadikulIslamDotNetResumeSadikulIslamDotNetResume
SadikulIslamDotNetResumeSadikul Islam
 

Similar to Screen2Vec: Semantic Embedding of GUI Screens and GUI Components (20)

Metaworks4 intro
Metaworks4 introMetaworks4 intro
Metaworks4 intro
 
Curriculum vitae of Varun Rawat. Resume is for GIS candidature.
Curriculum vitae of Varun Rawat. Resume is for GIS candidature.Curriculum vitae of Varun Rawat. Resume is for GIS candidature.
Curriculum vitae of Varun Rawat. Resume is for GIS candidature.
 
Lecture14 abap on line
Lecture14 abap on lineLecture14 abap on line
Lecture14 abap on line
 
Software Engineering 2014
Software Engineering 2014Software Engineering 2014
Software Engineering 2014
 
Resume_A_Vinod
Resume_A_VinodResume_A_Vinod
Resume_A_Vinod
 
Transforming the web into a real application platform
Transforming the web into a real application platformTransforming the web into a real application platform
Transforming the web into a real application platform
 
Online webinar on latest nx enhancements
Online webinar on latest nx enhancementsOnline webinar on latest nx enhancements
Online webinar on latest nx enhancements
 
PykQuery.js
PykQuery.jsPykQuery.js
PykQuery.js
 
l1-reactnativeintroduction-160816150540.pdf
l1-reactnativeintroduction-160816150540.pdfl1-reactnativeintroduction-160816150540.pdf
l1-reactnativeintroduction-160816150540.pdf
 
[Seminar] 200605 seunghyeong choe
[Seminar] 200605 seunghyeong choe[Seminar] 200605 seunghyeong choe
[Seminar] 200605 seunghyeong choe
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 
React Native Introduction: Making Real iOS and Android Mobile App By JavaScript
React Native Introduction: Making Real iOS and Android Mobile App By JavaScriptReact Native Introduction: Making Real iOS and Android Mobile App By JavaScript
React Native Introduction: Making Real iOS and Android Mobile App By JavaScript
 
Osgis sept2012 cartogrammar
Osgis sept2012  cartogrammarOsgis sept2012  cartogrammar
Osgis sept2012 cartogrammar
 
Best Institute for IBM Web Experience Factory
Best Institute for IBM Web Experience FactoryBest Institute for IBM Web Experience Factory
Best Institute for IBM Web Experience Factory
 
03.Controls in Windows Phone
03.Controls in Windows Phone03.Controls in Windows Phone
03.Controls in Windows Phone
 
MicroStation Product Datasheet
MicroStation Product DatasheetMicroStation Product Datasheet
MicroStation Product Datasheet
 
ASP.NET
ASP.NETASP.NET
ASP.NET
 
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdf
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdfSpecialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdf
Specialist-ArcGIS-Server-API-3.x-and-4-for-JavaScript.pdf
 
React Native: Introduction
React Native: IntroductionReact Native: Introduction
React Native: Introduction
 
SadikulIslamDotNetResume
SadikulIslamDotNetResumeSadikulIslamDotNetResume
SadikulIslamDotNetResume
 

More from ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...ivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

  • 1. Screen2Vec: Semantic Embedding of GUI Screens and GUI Components Toby Li, Lindsay Popowski, Tom M Mitchell, Brad A. Myers 2021 CHI Conference on Human Factors in Computing Systems
  • 2. Background • Existing approaches of representing GUI screens are limited  Capturing only text on the screen • Missing information encoded in the layout and design pattern  Focusing on the visual design patterns and GUI layouts • Not capturing the content in the GUI • Prior approaches use supervised learning with large datasets for specific task objectives  Requiring labeling efforts  Inapplicable in different downstream tasks 1 Semantic representations of GUI screens and components
  • 3. Contribution • Presenting a self-supervised technique, not requiring human-labeled data • Generating more comprehensive semantic embeddings of GUI screens and components using  Textual content  Visual design  Layout patterns  App meta-data • Training an open-sourced GUI embedding model using Screen2Vec with RICO dataset • Providing sample downstream tasks such as  Nearest neighbor retrieval  Composability-based retrieval  Representing mobile tasks 2
  • 4. Architecture of Screen2Vec 3 GUI Component level GUI Screen Level • Two-level architecture
  • 5. Architecture of Screen2Vec • Input  768-dimensional embedding vector of the text label of the GUI component • Encoded using a pre-trained Sentence-BERT  6-dimensional class embedding vector • Representing the class type of the GUI component • Optimizing weights in the class embeddings and weights in the linear layer (text + class) • Output  768-dimensional embedding vector 4 GUI Component Level
  • 6. Architecture of Screen2Vec 1) Collection of GUI component embedding vector  Combined into a 768-dimensional vector using RNN 2) 64-dimensional layout embedding vector  Encoding the screen’s visual layout 3) 768-dimensional embedding vector of the textual App Store description  Encoded with a pre-trained Sentence-BERT model • GUI(1) and layout(2) vectors are combined using a linear layer  768-dimensional embedding vector • After training, description(3) vector is concatenated  1536-dimensional embedding vector • Weights of RNN and weights of the linear layer trained on a Continuous Bag of Word prediction 5 GUI Screen Level
  • 7. Dataset • RICO Dataset  Containing interaction traces on 66,261 unique GUI screens  From 9,384 free Android apps • Specifics  Each dataset with a screenshot image  Screen’s “view hierarchy” (e.g., DOM tree in HTML) in a JSON file • Each node including • Class type • Textual content • Location as the bounding box on the screen • Properties such as whether it is clickable, focused, or scrollable  Each interaction trace represented as a sequence of GUI screens • Which location is clicked or swiped 6
  • 8. Implementation Details • Encoding 26 class categories into a vector space • Mapping each of the categories into a continuous 6-dimensional vector • Optimizing embedding vector value by training GUI component prediction task  Categories semantically similar, close in the vector space 7 GUI Class Type Embeddings
  • 9. Implementation Details • Defining the context of a component as its 16 nearest components • Measures of screen distance for determining the context  Euclidean : straight-line minimal distance on the screen • In pixel  Hierarchical : distance between 2 GUI components on the hierarchical view tree • Parent and children : 1 8 GUI Component Context
  • 10. Implementation Details • Combining multiple vectors into a lower-dimension vector • GUI component level  Concatenating 768-dimension with 6-dimension  Shrinking down to 768-dimension  Creating 774 x 768 weights • GUI screen level  Combining 768-dimension and 64-dimension  Producing 768-dimension for screen content and layout 9 Linear Layer
  • 11. Implementation Details • Use a pre-trained Sentence-BERT language model  Using SNLI and Multi-Genre NLI datasets with mean-pooling • Encoding the text label of description to 768-dimensional vectors • Deriving semantically similar sentences and phrases 10 Text Embeddings
  • 12. Implementation Details • Extracting the layout from a screenshot • Differentiating between text and non-text GUI components • Using autoencoder to encode each image into 64-dimensional embedding vector • Encoder’s input dimension : 11,200 • Two hidden layers of 2,048 and 256 • Applying RLU to eliminate negative input • Loss determined by MSE 11 Layout Embeddings
  • 13. Implementation Details • Combining embedding vectors of multiple GUI components • GUI components embeddings fed into the RNN  In the pre-order traversal order of hierarchy tree • Starting with hidden state of zero, fed into a linear layer with 𝑛 − 1 𝑡ℎ output 12 GUI Embedding Combining Layer
  • 14. Training Configuration • Training: 90% of the data; validation: 10% • Cross entropy loss function with Adam optimizer • Learning rate: 0.001; batch size: 256 • GUI component model: 120 epochs; GUI screen model: 80-120 epochs • Total loss  Component • Total Loss = Loss(text prediction) + Loss(class type prediction)  Screen • Negative sampling • Prediction compared to the correct screen and a sample of negative data • Random sampling of other screens with size 128 on the same app • To differentiate different screens on the same app 13
  • 15. Baselines • Text Embedding Only (similar textual context)  Screen embedding method used in SOVITE  Computed by averaging the text embedding vectors for all the text in the screen • Layout Embedding Only (similar layout)  Screen embedding method used in the original RICO paper  Computed by the layout autoencoder to represent the screen • Visual Embedding Only (similar visual)  Direct screen shot of image instead of layout  Inspired by VASTA, Sikuli, and HILC 14
  • 16. Results • Predicting each GUI screen in all the GUI interaction traces in the RICO dataset using its context  3 versions to compare • EUCLIDEAN with locations of GUI components and the screen layouts • HIERARCHICAL with above spatial info • EUCLIDEAN without spatial info 15
  • 17. Sample Downstream Tasks • The main purpose is to produce distributed vector representations that encode useful semantic, layout, design properties • Compare similarity between the nearest neighbor results by different models Methods • Select 50 screens from apps and app domains • Retrieve top-5 most similar screens using each of 3 models • 79 Mechanical Turk workers participated • Each worker saw top-5 most similar screens of 5 source screens produced by 3 models • Questionnaires include followings  (1) App similarity (2) Screen type similarity (3) Content similarity 16 Nearest Neighbors
  • 18. Sample Downstream Tasks Results • The differences between the mean ratings of the Screen2Vec model and both TextOnly and LayoutOnly model are significant (non-parametric Mann-Whitney U test) • Retrieve top-5 most similar screens using each of 3 models 17 Nearest Neighbors
  • 19. Sample Downstream Tasks Observation • Screen2Vec generate more comprehensive representations  “Request ride” in Lyft • “Get direction” in Uber Driver • “Select navigation type” in Waze app • “Request ride” in Free Now  MapView taking majority  All feature a menu/information card at the bottom 1/3 – 1/4 • TextOnly generated results are semantically similar to “payment” • LayoutOnly generated results show lower score in the content and app-context similarity 18 Nearest Neighbors
  • 20. Sample Downstream Tasks Word2Vec • “Man is to woman as brother is to sister” • (brother - man + woman) results in an embedding vector representing sister Screen2Vec • Marriott app ’s “hotel booking” screen + (Cheapoair app’s “search result” screen − Cheapoair app’s “hotel booking” screen)) • The top result is the “search result” screen in the Marriott app 19 Embedding Composability
  • 21. Sample Downstream Tasks • Preliminary evaluation on the effectiveness of embedding mobile tasks as sequences of Screen2Vec screen embedding vectors • Recording scripts of completing 10 common smartphone tasks • Representing each task as the average of Screen2Vec vectors • Querying for the nearest neighbor within 20 task variations and get 18/20 accuracy  TextOnly : 14/20 accuracy 20 Screen Embedding Sequences for Representing Mobile Tasks
  • 22. Potential Application • Designers query for example designs that display similar content or screens in apps of a similar domain • Composability helps to find a specific page for the app  Suppose a designer searches for checkout page for app A  A’s order page + (App B’s checkout page – App B’s order page • LayoutGAN can generate realistic GUI layouts based on user-specified constraints  Applying Screen2Vec to incorporate the semantics of GUIs and the context of user interaction 21
  • 23. Limitation • Only trained and test on Android app GUIs • RICO dataset  contains interaction traces within single apps  need to generalize multiple app  Does not contain paid apps • Screen2Vec does not encode the semantics of graphic icons that have no textual information 22

Editor's Notes

  1. Correct GUI component is among the top 0.01% in the prediction result Aggregating textual information is useful for representing topic of a screen  good top 0.1% and 1% / NRMSE
  2. Textual content, visual design, layout pattern, and app context
  3. Add, substract, and average to form meaningful new one
  4. Add, substract, and average to form meaningful new one
  5. Add, substract, and average to form meaningful new one