SlideShare a Scribd company logo
Generating Audio-Visual Slideshows from
Text Articles Using Word Concreteness
Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, Maneesh Argawala
CHI 2020
Apr. 9th, 2021
Presenter: Seunghyeong Choe
Contents
• Overview of the paper
• Introduction
• Related Work
• Formative Study
• Methods
• Result
• Evaluation
• Limitation and Future Work
2
Overview of the paper
Automatically transform text article to audio-visual slideshows
Evaluate generated slideshow
Use word concreteness to select key word
Images selection based on word concreteness
3
Terminology
• Word Concreteness
 How strongly a word or phrase is related to some perceptible concept.
AirPods Intuitive
?
4
Introduction
Audio-visual
Visual
Content
• Enhance written information
• Emphasize with photos
• Diagrams make article easier to understand
• Maps illustrate direction
• Include audio contents
• Aid in longer term recall
• Higher preference
• Higher understandability
• Requires author’s significant effort, time, and skill.
5
Introduction
1. Find most concrete words
2. Search image files
3. Speech generation
6
Related Work
Text Based Video Editing Tools Automatic Visualization of Text
• Article Video Robot
• Automatically arrange user-provided video clip
• Visual Transcripts
• SceneSkim
• Videolization: Wikipedia articles to video
• Text summarization using multiple images
• Multimodal summaries for complex sentences
7
Formative Study
• How the format of articles impacts a viewer’s understanding and preferences?
• Recruit 120 participants on Amazon Mechanical Turk
 Preference and understandability of three formats
 Randomly assign articles
Text only Text with images Audio-visual slideshows
8
Formative Study
• Survey Result
Viewers preferred the slideshow format over text-only and text with images.
Viewers also found content presented in slideshows easiest to understand
9
Methods
• Methods to generate audio-visual slideshow from a text article
A. Segment text article into sentences
B. Search image files by using concrete words
C. Generate audio narration by using Google Cloud Text-to-Speech
D. Time-aligning audio narration and image files
10
Methods
Obtaining Images for Text Using Concreteness
• Computing the image search query for each sentences
 Sub-step 1: Concreteness
• 40k word dataset
• Human rated concreteness on a scale of 1 to 5
• Empirically τ = 4.5 is good. (farmers, wheat)
• spaCy dependency parser to identify noun phrases and compound nouns (common wheat)
They are often raised in Kansas, near where farmers also grow common wheat.
2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
11
Methods
Obtaining Images for Text Using Concreteness
• Computing the image search query for each sentences
 Sub-step 2: Named Entities
• Use spaCy named entity recognition tags to identify words
• People, places, and organizations (Kansas)
They are often raised in Kansas, near where farmers also grow common wheat.
2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
12
Methods
Obtaining Images for Text Using Concreteness
• Computing the image search query for each sentences
 Sub-Step 3: Pronoun Replacement
• Use Neural-Coref
• Pronoun coreference resolution method
• Data-driven NLP approach
• They → Cows
Cows are often raised in Kansas, near where farmers also grow common wheat.
2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
13
Methods
Obtaining Images for Text Using Concreteness
• Special Cases
 Duplicate words: keep only a single occurrence
 Single word query in a sentence: add the article to provide context and reduce ambiguity
 Empty search query: continue to show the image from the previous sentences
 First sentences search query is empty: pull the image from the nearest sentence (rare case)
• Image Selection
 Use Bing Image Search
 Minimum resolution 480x360, aspect ratio 4:3
 Filter out charts, diagrams, and images that contain text
 Remove stock image URL to avoid watermarks
14
Methods
Slideshow Composition
• Audio Narration
 Google Cloud Text-to-Speech
 Reprocess the output audio through Google Speech-to-Text
• Returns per-word-time-stamps
• Provide timing information
 Needleman-Wunsch algorithm
• To find optimal alignment between the input text article and the transcript
• Time-aligning images to the narration
 Continue a previous image if the length of image is shorter than 2 seconds
• Composition and Effects
 Crop into 960x720, using Python-smart-crop
 Zoom if face is detected and pan if salient region exists, using OpenCV
 Add captions
15
Results
• Create 13 slideshow videos using Wikipedia articles and HowStuffWorks articles.
• Takes 2~10 minutes to generate slideshow
16
Results
• Sentences without concrete words
 Conversational sentences
 Pulls the image from the next sentence
 Holds the prior image on screen.
17
Evaluation
• Comparison of Automatic and Manual Search Queries
• How well the system identifies the appropriate image
search query?
• Evaluate overall quality of generated slideshows
• 3 human annotators create manually without
knowledge of the system
• Red texts: commonly selected search queries
• Green texts: manually selected search queries
• Blue texts: automatically selected search queries
• Measure F1 score to compare the words between
manual and automatically selected
• Both auto generated and manual resulting images
may not differ in meaningful ways
18
Evaluation
• User Study and Feedback
 Assessing output slideshow video
 Compare 3 types of video
• Manually created video
• Keyword-search based approach video (Rapid Automatic Keyword Extraction, RAKE)
• Concreteness based video
 Recruit 120 participants from Amazon Mechanical Turk
19
Evaluation
• User Study and Feedback
Participants strongly preferred their slideshows over the keyword-based version.
No strong preference between manually created and automatically created version.
20
Evaluation
• User Study and Feedback
Automatically selected images were more relevant than the keyword-based approach
21
Limitation and Future Work
• Concreteness can be applied to a wide range
of domains
 Poetry, classic literature
 Different grammatical structure from international
articles
• Not filtering copyrighted images
• Cannot identify object and person that are
not famous
• Only uses static images
• Future work
 Using video clips (trimming, timing)
 Utilize imageability, specificity, familiarity in
addition to concreteness
Thank you
Any questions?

More Related Content

Similar to Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
Semantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videosSemantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videos
darsh228313
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Simon Hughes
 
Getting Captioning Started on Campus: Lessons Learned
Getting Captioning Started on Campus: Lessons LearnedGetting Captioning Started on Campus: Lessons Learned
Getting Captioning Started on Campus: Lessons Learned
Dean Brusnighan
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
Edge AI and Vision Alliance
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
Udaiappa Ramachandran
 
Image caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptxImage caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptx
erharshkumarroy
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
VasileiosMezaris
 
STC Information Topology
STC Information TopologySTC Information Topology
STC Information Topology
TyrinAvery1
 
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessiblyUsing Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
D2L Barry
 
Using Web 2.0 Tools inside Brightspace with an Eye on Accessibility
Using Web 2.0 Tools inside Brightspace with an Eye on AccessibilityUsing Web 2.0 Tools inside Brightspace with an Eye on Accessibility
Using Web 2.0 Tools inside Brightspace with an Eye on Accessibility
D2L
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
Dr. Amit Sachan
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
Jonathon Hare
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Sri Ambati
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
Jonathon Hare
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
 
OMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptxOMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptx
045HridikGulatiT17
 
Watson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson ChallengeWatson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson Challenge
Penn State EdTech Network
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
Arjen de Vries
 

Similar to Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness (20)

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Semantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videosSemantic Summarization of videos, Semantic Summarization of videos
Semantic Summarization of videos, Semantic Summarization of videos
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Getting Captioning Started on Campus: Lessons Learned
Getting Captioning Started on Campus: Lessons LearnedGetting Captioning Started on Campus: Lessons Learned
Getting Captioning Started on Campus: Lessons Learned
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Image caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptxImage caption generation L18_CNN_RNN_2.pptx
Image caption generation L18_CNN_RNN_2.pptx
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
STC Information Topology
STC Information TopologySTC Information Topology
STC Information Topology
 
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessiblyUsing Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
Using Web-based Tools in Brightspace, with an Eye on Accessibility accessibly
 
Using Web 2.0 Tools inside Brightspace with an Eye on Accessibility
Using Web 2.0 Tools inside Brightspace with an Eye on AccessibilityUsing Web 2.0 Tools inside Brightspace with an Eye on Accessibility
Using Web 2.0 Tools inside Brightspace with an Eye on Accessibility
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
OMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptxOMRES-ProgressPresentation1.pptx
OMRES-ProgressPresentation1.pptx
 
Watson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson ChallengeWatson API Use Case Demos for the Nittany Watson Challenge
Watson API Use Case Demos for the Nittany Watson Challenge
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 

More from ivaderivader

Argument Mining
Argument MiningArgument Mining
Argument Mining
ivaderivader
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
ivaderivader
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
ivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
ivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
ivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
ivaderivader
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
ivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
ivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
ivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
ivaderivader
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
ivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
ivaderivader
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
ivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
ivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness

  • 1. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, Maneesh Argawala CHI 2020 Apr. 9th, 2021 Presenter: Seunghyeong Choe
  • 2. Contents • Overview of the paper • Introduction • Related Work • Formative Study • Methods • Result • Evaluation • Limitation and Future Work
  • 3. 2 Overview of the paper Automatically transform text article to audio-visual slideshows Evaluate generated slideshow Use word concreteness to select key word Images selection based on word concreteness
  • 4. 3 Terminology • Word Concreteness  How strongly a word or phrase is related to some perceptible concept. AirPods Intuitive ?
  • 5. 4 Introduction Audio-visual Visual Content • Enhance written information • Emphasize with photos • Diagrams make article easier to understand • Maps illustrate direction • Include audio contents • Aid in longer term recall • Higher preference • Higher understandability • Requires author’s significant effort, time, and skill.
  • 6. 5 Introduction 1. Find most concrete words 2. Search image files 3. Speech generation
  • 7. 6 Related Work Text Based Video Editing Tools Automatic Visualization of Text • Article Video Robot • Automatically arrange user-provided video clip • Visual Transcripts • SceneSkim • Videolization: Wikipedia articles to video • Text summarization using multiple images • Multimodal summaries for complex sentences
  • 8. 7 Formative Study • How the format of articles impacts a viewer’s understanding and preferences? • Recruit 120 participants on Amazon Mechanical Turk  Preference and understandability of three formats  Randomly assign articles Text only Text with images Audio-visual slideshows
  • 9. 8 Formative Study • Survey Result Viewers preferred the slideshow format over text-only and text with images. Viewers also found content presented in slideshows easiest to understand
  • 10. 9 Methods • Methods to generate audio-visual slideshow from a text article A. Segment text article into sentences B. Search image files by using concrete words C. Generate audio narration by using Google Cloud Text-to-Speech D. Time-aligning audio narration and image files
  • 11. 10 Methods Obtaining Images for Text Using Concreteness • Computing the image search query for each sentences  Sub-step 1: Concreteness • 40k word dataset • Human rated concreteness on a scale of 1 to 5 • Empirically τ = 4.5 is good. (farmers, wheat) • spaCy dependency parser to identify noun phrases and compound nouns (common wheat) They are often raised in Kansas, near where farmers also grow common wheat. 2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
  • 12. 11 Methods Obtaining Images for Text Using Concreteness • Computing the image search query for each sentences  Sub-step 2: Named Entities • Use spaCy named entity recognition tags to identify words • People, places, and organizations (Kansas) They are often raised in Kansas, near where farmers also grow common wheat. 2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
  • 13. 12 Methods Obtaining Images for Text Using Concreteness • Computing the image search query for each sentences  Sub-Step 3: Pronoun Replacement • Use Neural-Coref • Pronoun coreference resolution method • Data-driven NLP approach • They → Cows Cows are often raised in Kansas, near where farmers also grow common wheat. 2.93 1.96 2.50 2.86 3.0 x 2.79 1.66 4.54 1.83 3.03 2.07 4.89
  • 14. 13 Methods Obtaining Images for Text Using Concreteness • Special Cases  Duplicate words: keep only a single occurrence  Single word query in a sentence: add the article to provide context and reduce ambiguity  Empty search query: continue to show the image from the previous sentences  First sentences search query is empty: pull the image from the nearest sentence (rare case) • Image Selection  Use Bing Image Search  Minimum resolution 480x360, aspect ratio 4:3  Filter out charts, diagrams, and images that contain text  Remove stock image URL to avoid watermarks
  • 15. 14 Methods Slideshow Composition • Audio Narration  Google Cloud Text-to-Speech  Reprocess the output audio through Google Speech-to-Text • Returns per-word-time-stamps • Provide timing information  Needleman-Wunsch algorithm • To find optimal alignment between the input text article and the transcript • Time-aligning images to the narration  Continue a previous image if the length of image is shorter than 2 seconds • Composition and Effects  Crop into 960x720, using Python-smart-crop  Zoom if face is detected and pan if salient region exists, using OpenCV  Add captions
  • 16. 15 Results • Create 13 slideshow videos using Wikipedia articles and HowStuffWorks articles. • Takes 2~10 minutes to generate slideshow
  • 17. 16 Results • Sentences without concrete words  Conversational sentences  Pulls the image from the next sentence  Holds the prior image on screen.
  • 18. 17 Evaluation • Comparison of Automatic and Manual Search Queries • How well the system identifies the appropriate image search query? • Evaluate overall quality of generated slideshows • 3 human annotators create manually without knowledge of the system • Red texts: commonly selected search queries • Green texts: manually selected search queries • Blue texts: automatically selected search queries • Measure F1 score to compare the words between manual and automatically selected • Both auto generated and manual resulting images may not differ in meaningful ways
  • 19. 18 Evaluation • User Study and Feedback  Assessing output slideshow video  Compare 3 types of video • Manually created video • Keyword-search based approach video (Rapid Automatic Keyword Extraction, RAKE) • Concreteness based video  Recruit 120 participants from Amazon Mechanical Turk
  • 20. 19 Evaluation • User Study and Feedback Participants strongly preferred their slideshows over the keyword-based version. No strong preference between manually created and automatically created version.
  • 21. 20 Evaluation • User Study and Feedback Automatically selected images were more relevant than the keyword-based approach
  • 22. 21 Limitation and Future Work • Concreteness can be applied to a wide range of domains  Poetry, classic literature  Different grammatical structure from international articles • Not filtering copyrighted images • Cannot identify object and person that are not famous • Only uses static images • Future work  Using video clips (trimming, timing)  Utilize imageability, specificity, familiarity in addition to concreteness