SlideShare a Scribd company logo
Visual Storytelling
Ting-hao (Kenneth) Huang et al.
Presenter: Yiming Pang
There is a story behind every image
A group of people that are
sitting next to each other.
Having a good time
bonding and talking
There is another way to describe the scene
The sun is setting over the
ocean and mountains.
Sky illuminated with a
brilliance of gold and
orange hues.
Visual Storytelling: A solid next move in AI
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
From Vision to Language
Work in vision to language has exploded….
From Vision to Language
• Image Captioning
• Given an image, describe it in natural language
Deep Visual-Semantic Alignment for Generating Image Descriptions A. Karpathy, L. Fei-Fei
From Vision to Language
• Question Answering
• Takes as input an image and a free-form, open-ended, natural language
question about the image and produces a natural language answer as the
output.
VQA: Visual Question Answering A. Agrawal et al.
From Vision to Language
• Visual Phrases
• Chunks of meaning bigger than objects and smaller than scenes
Recognition using visual phrases M. Sadeghi and A. Farhadi
And the list keeps going on…
Why visual storytelling?
• Other works focus on direct, literal description of image content.
• Useful, meaningful
• But still, far from the capabilities needed by intelligent agents for naturalistic
interactions
• However, with visual storytelling
• More evaluative and figurative language
• Brings to bear information about social relations and emotions
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
What is visual storytelling?
• Go beyond basic description (literal description) of visual
scenes
• Towards human-like understanding of grounded event
structure and subjective expression (narrative).
Literal Description
Sitting next to each other
Sun is setting
VS.
Narrative
Having a good time
Sky illuminated with a brilliance…
Good story requires more information
Single Image
Sequence of Images
Three Tiers of Language for the Same Image
• Descriptions of Images-In-Isolation(DII):
• Plain description as in image captioning
• Descriptions of Images-In-Sequence(DIS):
• Same language style but images are displayed in a sequence
• Stories for Images-In-Sequence(SIS)
• An ACTUAL story
Three Tiers of Language for the Same Image
Descriptive
Text
≠
Consecutive
Captions
≠
Stories
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
Extracting Photos
Flickr Data Release Stanford CoreNLP
Feed into Extract
Possessive Dependence Patterns
Descriptions
Filter by
Classify as EVENT
Flickr API
Only include albums within a
48-hour span
Dataset Crowdsourcing Workflow
Flickr
Album
Description for
Images
in Isolation
&
in Sequences
Story 1
Storytelling
Story 2
Story 3
Re-telling
Preferred Photo
Sequence
Story 4
Story 5
Interface for Storytelling
Data Analysis
• 10,117 Flickr albums
• 210,819 unique photos
• 20.8 photos per album on average
• 7.9 hours time span on average
Top Words Associated with Each Tier
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
What’s the best metric to evaluate the story?
• The best and most reliable evaluation is human judgment
• Crowdsourcing on MTurk
• For quick benchmark progress: automatic evaluation metric
• METEOR
• The Meteor automatic evaluation metric scores machine translation hypotheses by aligning
them to one or more reference translations. Alignments are based on exact, stem, synonym,
and paraphrase matches between words and phrases.
• Smoothed-BLEU
• Bilingual evaluation user study
• Skip-Thoughts
Strongly disagree Disagree Neutral Agree Strongly agree
Which one is the best?
• Correlations of automatic scores against human judgements, with p-
values in parentheses
Train
Show and tell: a neural image caption generator O. Vinyals et al.
Sequence of
Images
Generate the story
• Simple beam search (size=10)
• However, it does not work very well…
This is a picture of a
family.
This is a picture of a
cake.
This is a picture of a
dog.
This is a picture of a
beach.
This is a picture of a
beach
Generate the better story
• Greedy beam search (size=1)
• Resulting in a 4.6 gain in METEOR score
The family gathered
together for a meal
The food was
delicious.
The dog was excited
to be there.
The dog was enjoying
the water.
The dog was happy to
be in the water.
Generate the better story (cont.)
• A very simple heuristic: the same content word cannot be produced
more than once within a given story.
• Resulting in a 2.3 gain in METEOR score
The family gathered
together for a meal
The food was
delicious.
The dog was excited
to be there.
The kids were playing
in the water
The boat was a little
too much to drink.
Generate the better story (cont.)
• Additional baseline: visually grounded words
•
!(#|%&'()*+,)
!(#|%.)+/0)
> 1.0
• Resulting in a 1.3 gain in METEOR score
The family got
together for a cookout
They had a lot of
delicious food.
The dog was happy to
be there.
They had a great time
on the beach.
They even had a
swim in the water.
Final Results
• METEOR scores for different methods
Outline
• Motivation and Related Work
• Visual Storytelling 101
• Dataset: SIND
• Baseline Experiments
• Conclusion
Conclusion
• The first dataset for sequential vision-to-language.
• Images-in-isolation to stories-in-sequence.
• Evolving AI towards more human-like understanding
Q&A

More Related Content

Recently uploaded

How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 

Recently uploaded (20)

How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

pang_paper.pdf

  • 1. Visual Storytelling Ting-hao (Kenneth) Huang et al. Presenter: Yiming Pang
  • 2. There is a story behind every image A group of people that are sitting next to each other. Having a good time bonding and talking
  • 3. There is another way to describe the scene The sun is setting over the ocean and mountains. Sky illuminated with a brilliance of gold and orange hues.
  • 4. Visual Storytelling: A solid next move in AI
  • 5. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 6. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 7. From Vision to Language Work in vision to language has exploded….
  • 8. From Vision to Language • Image Captioning • Given an image, describe it in natural language Deep Visual-Semantic Alignment for Generating Image Descriptions A. Karpathy, L. Fei-Fei
  • 9. From Vision to Language • Question Answering • Takes as input an image and a free-form, open-ended, natural language question about the image and produces a natural language answer as the output. VQA: Visual Question Answering A. Agrawal et al.
  • 10. From Vision to Language • Visual Phrases • Chunks of meaning bigger than objects and smaller than scenes Recognition using visual phrases M. Sadeghi and A. Farhadi
  • 11. And the list keeps going on…
  • 12. Why visual storytelling? • Other works focus on direct, literal description of image content. • Useful, meaningful • But still, far from the capabilities needed by intelligent agents for naturalistic interactions • However, with visual storytelling • More evaluative and figurative language • Brings to bear information about social relations and emotions
  • 13. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 14. What is visual storytelling? • Go beyond basic description (literal description) of visual scenes • Towards human-like understanding of grounded event structure and subjective expression (narrative). Literal Description Sitting next to each other Sun is setting VS. Narrative Having a good time Sky illuminated with a brilliance…
  • 15. Good story requires more information Single Image Sequence of Images
  • 16. Three Tiers of Language for the Same Image • Descriptions of Images-In-Isolation(DII): • Plain description as in image captioning • Descriptions of Images-In-Sequence(DIS): • Same language style but images are displayed in a sequence • Stories for Images-In-Sequence(SIS) • An ACTUAL story
  • 17. Three Tiers of Language for the Same Image Descriptive Text ≠ Consecutive Captions ≠ Stories
  • 18. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 19. Extracting Photos Flickr Data Release Stanford CoreNLP Feed into Extract Possessive Dependence Patterns Descriptions Filter by Classify as EVENT Flickr API Only include albums within a 48-hour span
  • 20. Dataset Crowdsourcing Workflow Flickr Album Description for Images in Isolation & in Sequences Story 1 Storytelling Story 2 Story 3 Re-telling Preferred Photo Sequence Story 4 Story 5
  • 22. Data Analysis • 10,117 Flickr albums • 210,819 unique photos • 20.8 photos per album on average • 7.9 hours time span on average
  • 23. Top Words Associated with Each Tier
  • 24. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 25. What’s the best metric to evaluate the story? • The best and most reliable evaluation is human judgment • Crowdsourcing on MTurk • For quick benchmark progress: automatic evaluation metric • METEOR • The Meteor automatic evaluation metric scores machine translation hypotheses by aligning them to one or more reference translations. Alignments are based on exact, stem, synonym, and paraphrase matches between words and phrases. • Smoothed-BLEU • Bilingual evaluation user study • Skip-Thoughts Strongly disagree Disagree Neutral Agree Strongly agree
  • 26. Which one is the best? • Correlations of automatic scores against human judgements, with p- values in parentheses
  • 27. Train Show and tell: a neural image caption generator O. Vinyals et al. Sequence of Images
  • 28. Generate the story • Simple beam search (size=10) • However, it does not work very well… This is a picture of a family. This is a picture of a cake. This is a picture of a dog. This is a picture of a beach. This is a picture of a beach
  • 29. Generate the better story • Greedy beam search (size=1) • Resulting in a 4.6 gain in METEOR score The family gathered together for a meal The food was delicious. The dog was excited to be there. The dog was enjoying the water. The dog was happy to be in the water.
  • 30. Generate the better story (cont.) • A very simple heuristic: the same content word cannot be produced more than once within a given story. • Resulting in a 2.3 gain in METEOR score The family gathered together for a meal The food was delicious. The dog was excited to be there. The kids were playing in the water The boat was a little too much to drink.
  • 31. Generate the better story (cont.) • Additional baseline: visually grounded words • !(#|%&'()*+,) !(#|%.)+/0) > 1.0 • Resulting in a 1.3 gain in METEOR score The family got together for a cookout They had a lot of delicious food. The dog was happy to be there. They had a great time on the beach. They even had a swim in the water.
  • 32. Final Results • METEOR scores for different methods
  • 33. Outline • Motivation and Related Work • Visual Storytelling 101 • Dataset: SIND • Baseline Experiments • Conclusion
  • 34. Conclusion • The first dataset for sequential vision-to-language. • Images-in-isolation to stories-in-sequence. • Evolving AI towards more human-like understanding
  • 35. Q&A