Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
16 Career Clusters
Next
Download to read offline and view in fullscreen.

1

Share

Download to read offline

Are you talking to a machine?

Download to read offline

For more details see also the published paper on ArXiv.org under Computer Science, Computer Vision and Pattern Recognition:

http://arxiv.org/abs/1505.05612

Children learn to use language by associating what they hear with what they see. It is more nature for human to get the information about what they see by asking a question. We present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train. We construct a Freestyle Multilingual Image Question Answering (FM- IQA) dataset to train and evaluate our mQA model. We propose strategies to monitor the quality of this evaluation process. The experiments show that in 64.7% of cases, the human judges cannot distinguish our model from humans. The project page: http://idl.baidu.com/FM-IQA.html;

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Are you talking to a machine?

  1. 1. Are you talking to a Machine Dataset and Methods for Multilingual Image Question Answering Haoyuan Gao; Junhua Mao; Jie Zhou; Zhiheng Huang; Lei Wang; Wei Xu Poster Presentation - Nips Conference 2015
  2. 2. Children learn to use language by associating what they hear with what they see. It is more nature for human to get the information about what they see by asking a question. We present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. We construct a Freestyle Multilingual Image Question Answering (FM-IQA ) dataset to train. We construct a Freestyle Multilingual Image Question Answering (FM- IQA ) dataset to train and evaluate our mQA model. We propose strategies to monitor the quality of this evaluation process. The experiments show that in 64.7% of cases, the human judges cannot distinguish our model from humans. The project page: http://idl.baidu.com/FM-IQA.html Abstract
  3. 3. Data Set
  4. 4. The Multimodal QA (mQA) Model
  5. 5. Experiment
  6. 6. Interesting Work
  7. 7. Conclusion: • We present the mQA model, which is able to give a sentence or a phrase as the answer to a freestyle question for an image. • We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset containing over 310,000 question-answer pairs. • We evaluate our method through a real Turing Test. Future work: • We will try to address these issues by incorporating more visual and linguistic information (e.g. using object detection or using attention models). Discussion
  • darkseed

    Dec. 9, 2015

For more details see also the published paper on ArXiv.org under Computer Science, Computer Vision and Pattern Recognition: http://arxiv.org/abs/1505.05612 Children learn to use language by associating what they hear with what they see. It is more nature for human to get the information about what they see by asking a question. We present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train. We construct a Freestyle Multilingual Image Question Answering (FM- IQA) dataset to train and evaluate our mQA model. We propose strategies to monitor the quality of this evaluation process. The experiments show that in 64.7% of cases, the human judges cannot distinguish our model from humans. The project page: http://idl.baidu.com/FM-IQA.html;

Views

Total views

761

On Slideshare

0

From embeds

0

Number of embeds

33

Actions

Downloads

6

Shares

0

Comments

0

Likes

1

×