SlideShare a Scribd company logo
Visual Storytelling
Ting-Hao Huang et al.
2017/10/3
Abstract
1. Introduce the first dataset for Sequential vision-to-language
2. SIND v.1 (Sequential Images Narrative Dataset)
3. 81743 unique photos in 20211 sequences
4. Establish string baselines
5. Move artificial intelligence from basic understandings of typical visual scenes
towards more and more human-like understanding of grounded event structure
and subjective expression
Table of content
1. Introduction
2. Motivation and Related Work
3. Dataset Construction
4. Data Analysis
5. Automatic Evaluation Metric
6. Baseline Experiments
7. Conclusion
Introduction
1. From description (concrete, literal) to narrative (abstract, further inference)
2. “Sitting next to each other” vs. “Having a good time”
3. Release three tiers of language for the same images
1) Descriptions of images-in-isolation (DII)
2) Descriptions of images-in-sequence (DIS)
3) Stories for images-in-sequence (SIS)
Motivation and Related Work
1. Image Captioning (2014,2015)
2. Question answering (2014)
3. Visual phrase (2011)
4. Vision Understanding (2013)
5. Visual Concepts (2015,2016)
Those works focus on direct, literal description od image content
Dataset Construction
1. Extracting Photos
1. Leverage the idea that “storyable” event tend to involve some form of
possession (John’s party; Shabnam’s visit)
2. Extract Flickr data with possessive dependency patterns (Standford CoreNLP)
3. Use WordNet3.0 to find out EVENT
4. Only include albums with 10 to 50 photos where all album photos are taken
within a 48-hour span
Dataset Construction
2. Crowdsourcing Stories In Sequence
1. 2-stage crowdsourcing
2. Storytelling : worker selects a subset of photos and writes a story about it
3. Re-telling :the worker writes a story based on one photo sequence generated
in the first stage
Dataset Construction
3. Crowdsourcing Descriptions of Images In Isolation & Images In
Sequence
1. Also collect descriptions of images-in-isolation and descriptions of images-in-
sequence
2. Follow the instructions for image captioning (MS COCO). Ex: describe all the
important parts
4. Data Post-processing
1. Replace name and identified named entities
Dataset Construction
2.
Data analysis
1. Dataset includes 10117 Flickr albums with 210819 unique photos.
2. Use normalized pointwise mutual information to identify the words most closely
associated with each tier.
Automatic Evaluation Metric
1. Human judgment is the most reliable way to evaluate.
2. Compute pairwise correlation coefficients between automatic metrics and
human judgments (score from 1-5) on 3000 stories from SIS training set.
3. Automatic metrics : METROR, smoothed-BLUE and Skip-Thoughts
4. METEOR correlates best with human judgment.
Baseline Experiments
1. Use Sequence-to-Sequence recurrent neural net
2. Encode an image sequence by running an RNN over fc7 vectors of each image, in
reverse order.
3. Use Gated Recurrent Units (GRUs) for both the image encoder and story decoder
4. Initially, Beam search (size = 100); but there’s lots of repetitive sentences.
5. Greedy search significantly increase the story quality.
6. Same content word cannot be produced more than once within a given story.
7. Filter out some “visually grounded” words
Conclusion
The details of the training were:
1. Extract 4096-dim FC 7 features using VGG16 without fine tuning
2. The encoder reads over the 5 images in a sequence. The order of images are reversed (i.e., the
first image in the sequence is the last one read in, following what is commonly done for
machine translation. This is probably not important though).
3. The encoder and decoder are 1000 dimensional GRU (no weight sharing)
4. The target word embedding size is 250 dimension (i.e., the dimension when the word that was
just produced is fed into the decoder GRU).
5. The target vocab size is words that occur 3 or more times in the training. Other words are
mapped to UNK (there is a constraint in the decoder that UNK cannot be produced at test time,
however).
6. 0.5 dropout on the image FC7 input (i.e., 50% of the 4096-dim FC7 features are dropped out
before being fed into encoder GRU. This is probably not important).
7. 0.5 dropout on the decoder GRU layer before applying it to the output layer.
8. If the story model is co-trained with caption data, you should use a token in the encoder GRU to
indicate which type of output to produce.
9. It's analogous to machine translation sequence-to-sequence models.

More Related Content

What's hot

Lecture 2 Introduction to digital image
Lecture 2 Introduction to digital imageLecture 2 Introduction to digital image
Lecture 2 Introduction to digital image
VARUN KUMAR
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Vignesh Suresh
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
RishabhTyagi48
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalVinh Bui
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Network
ijaia
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
Prudhvi Raj
 
Cnn
CnnCnn
Fixed-Point Code Synthesis for Neural Networks
Fixed-Point Code Synthesis for Neural NetworksFixed-Point Code Synthesis for Neural Networks
Fixed-Point Code Synthesis for Neural Networks
gerogepatton
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
Nguyen Van Chuc
 
Neural networks
Neural networksNeural networks
Neural networks
HarshitGupta367
 
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
IJSRD
 
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networks
SungminYou
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
Yogendra Tamang
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Chenghao Jin
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
Willy Marroquin (WillyDevNET)
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
btandale
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 

What's hot (20)

Lecture 2 Introduction to digital image
Lecture 2 Introduction to digital imageLecture 2 Introduction to digital image
Lecture 2 Introduction to digital image
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Network
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Cnn
CnnCnn
Cnn
 
Fixed-Point Code Synthesis for Neural Networks
Fixed-Point Code Synthesis for Neural NetworksFixed-Point Code Synthesis for Neural Networks
Fixed-Point Code Synthesis for Neural Networks
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
 
Neural networks
Neural networksNeural networks
Neural networks
 
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
 
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networks
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 

Similar to Visual storytelling

Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
IRJET Journal
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
CSCJournals
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
ijtsrd
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
DhirajGidde
 
PCS 2016 presentation
PCS 2016 presentationPCS 2016 presentation
PCS 2016 presentation
Ashek Ahmmed
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Sachin414679
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
aciijournal
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
aciijournal
 
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
cscpconf
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
ijceronline
 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learning
alihassaah1994
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine VisionNasir Jumani
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
AtharvaTanawade
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
IAESIJEECS
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal Embedding
IRJET Journal
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
IRJET Journal
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
IRJET Journal
 
Automated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired PeopleAutomated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired People
Christopher Mehdi Elamri
 
Text and Object Recognition using Deep Learning for Visually Impaired People
Text and Object Recognition using Deep Learning for Visually Impaired PeopleText and Object Recognition using Deep Learning for Visually Impaired People
Text and Object Recognition using Deep Learning for Visually Impaired People
ijtsrd
 

Similar to Visual storytelling (20)

Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 
PCS 2016 presentation
PCS 2016 presentationPCS 2016 presentation
PCS 2016 presentation
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learning
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine Vision
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
 
IRJET- Image Captioning using Multimodal Embedding
IRJET-  	  Image Captioning using Multimodal EmbeddingIRJET-  	  Image Captioning using Multimodal Embedding
IRJET- Image Captioning using Multimodal Embedding
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Automated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired PeopleAutomated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired People
 
Text and Object Recognition using Deep Learning for Visually Impaired People
Text and Object Recognition using Deep Learning for Visually Impaired PeopleText and Object Recognition using Deep Learning for Visually Impaired People
Text and Object Recognition using Deep Learning for Visually Impaired People
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Visual storytelling

  • 2. Abstract 1. Introduce the first dataset for Sequential vision-to-language 2. SIND v.1 (Sequential Images Narrative Dataset) 3. 81743 unique photos in 20211 sequences 4. Establish string baselines 5. Move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression
  • 3. Table of content 1. Introduction 2. Motivation and Related Work 3. Dataset Construction 4. Data Analysis 5. Automatic Evaluation Metric 6. Baseline Experiments 7. Conclusion
  • 4. Introduction 1. From description (concrete, literal) to narrative (abstract, further inference) 2. “Sitting next to each other” vs. “Having a good time” 3. Release three tiers of language for the same images 1) Descriptions of images-in-isolation (DII) 2) Descriptions of images-in-sequence (DIS) 3) Stories for images-in-sequence (SIS)
  • 5. Motivation and Related Work 1. Image Captioning (2014,2015) 2. Question answering (2014) 3. Visual phrase (2011) 4. Vision Understanding (2013) 5. Visual Concepts (2015,2016) Those works focus on direct, literal description od image content
  • 6. Dataset Construction 1. Extracting Photos 1. Leverage the idea that “storyable” event tend to involve some form of possession (John’s party; Shabnam’s visit) 2. Extract Flickr data with possessive dependency patterns (Standford CoreNLP) 3. Use WordNet3.0 to find out EVENT 4. Only include albums with 10 to 50 photos where all album photos are taken within a 48-hour span
  • 7. Dataset Construction 2. Crowdsourcing Stories In Sequence 1. 2-stage crowdsourcing 2. Storytelling : worker selects a subset of photos and writes a story about it 3. Re-telling :the worker writes a story based on one photo sequence generated in the first stage
  • 8. Dataset Construction 3. Crowdsourcing Descriptions of Images In Isolation & Images In Sequence 1. Also collect descriptions of images-in-isolation and descriptions of images-in- sequence 2. Follow the instructions for image captioning (MS COCO). Ex: describe all the important parts 4. Data Post-processing 1. Replace name and identified named entities
  • 10. Data analysis 1. Dataset includes 10117 Flickr albums with 210819 unique photos. 2. Use normalized pointwise mutual information to identify the words most closely associated with each tier.
  • 11. Automatic Evaluation Metric 1. Human judgment is the most reliable way to evaluate. 2. Compute pairwise correlation coefficients between automatic metrics and human judgments (score from 1-5) on 3000 stories from SIS training set. 3. Automatic metrics : METROR, smoothed-BLUE and Skip-Thoughts 4. METEOR correlates best with human judgment.
  • 12. Baseline Experiments 1. Use Sequence-to-Sequence recurrent neural net 2. Encode an image sequence by running an RNN over fc7 vectors of each image, in reverse order. 3. Use Gated Recurrent Units (GRUs) for both the image encoder and story decoder 4. Initially, Beam search (size = 100); but there’s lots of repetitive sentences. 5. Greedy search significantly increase the story quality. 6. Same content word cannot be produced more than once within a given story. 7. Filter out some “visually grounded” words
  • 14. The details of the training were: 1. Extract 4096-dim FC 7 features using VGG16 without fine tuning 2. The encoder reads over the 5 images in a sequence. The order of images are reversed (i.e., the first image in the sequence is the last one read in, following what is commonly done for machine translation. This is probably not important though). 3. The encoder and decoder are 1000 dimensional GRU (no weight sharing) 4. The target word embedding size is 250 dimension (i.e., the dimension when the word that was just produced is fed into the decoder GRU). 5. The target vocab size is words that occur 3 or more times in the training. Other words are mapped to UNK (there is a constraint in the decoder that UNK cannot be produced at test time, however). 6. 0.5 dropout on the image FC7 input (i.e., 50% of the 4096-dim FC7 features are dropped out before being fed into encoder GRU. This is probably not important). 7. 0.5 dropout on the decoder GRU layer before applying it to the output layer. 8. If the story model is co-trained with caption data, you should use a token in the encoder GRU to indicate which type of output to produce. 9. It's analogous to machine translation sequence-to-sequence models.