Image description through fusion based recurrent multi model learning

•

0 likes•91 views

Suhas Pillai

For a given image, describe what is happening in that image in one sentence.

Engineering

IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI MODAL
LEARNING
Ram Manohar Oruganti1
, Shagan Sah2
, Suhas Pillai3
and Raymond Ptucha1
ABSTRACT
Index Terms
1. INTRODUCTION
Fig. 1.

2. BACKGROUND
2.1 Convolutional Neural Networks
2.2 Long Short Term Memory Networks
<x1, x2, xt 1, xt, ,
xT>, xt 1 xt
xt
it ft
ot is gt
ct,
ht,
it, ft, ot
W b
3. PROPOSED LEARNING MODEL
3.1 FRMM model

Fig. 2.
3.2 FRMM variations
3.3 Image description through FRMMs
image stage language stage
fusion stage
4. EXPERIMENTAL RESULTS
4.1 Datasets

4.2 Training details
Caffe
4.3 Results
Model B 1 B 2 B 3 B 4
AFRMM 70.2 52.8 38.3 27.6
Table I.
CNN layer B 1 B 2 B 3 B 4
AFRMM+fc8 70.2 52.8 38.3 27.6
Table II.
Model B 1 B 2 B 3 B 4 METEOR
40.4
Our model 70.2 52.8 27.6 22.5
Table III.
Model B 1 B 2 B 3 B 4 METEOR
Vinyals [13] 66.3 42.3 27.7 18.3
Table IV.
5. CONCLUSION
6. REFERENCES
, et al.
arXiv preprint
arXiv:1409.0575,

26th Annual Conference on
Neural Information Processing Systems 2012, NIPS
2012, December 3, 2012 December 6, 2012
Proceedings of the IEEE,
27th Annual Conference on Neural Information
Processing Systems, NIPS 2013
Neural Computation,
ICASSP 2013
Computer Vision and Pattern
Recognition
Computer Vision and Pattern
Recognition
, et al.
Computer Vision and Pattern
Recognition
arXiv preprint
arXiv:1505.00487,
, et al.
Proceedings of the IEEE
International Conference on Computer Vision
, et al.
arXiv
preprint arXiv:1502.03044,
arXiv preprint arXiv:1411.4555,
21st
Annual Conference on Neural Information
Processing Systems, NIPS 2007
Advances in neural information processing systems
arXiv preprint arXiv:1410.4615,
Computer Vision and Pattern Recognition
arXiv preprint arXiv:1412.4729,
arXiv preprint
arXiv:1412.6632,
Transactions of the Association
for Computational Linguistics,
, et al.
Computer Vision ECCV
2014
ICLR
Proceedings of the 40th
annual meeting on association for computational
linguistics
In Proceedings of the Ninth
Workshop on Statistical Machine Translation
, et al.
arXiv preprint arXiv:1411.4389,
, et al.
Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition
arXiv preprint arXiv:1410.1090,

Recently uploaded

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

Unit 1 - Soil Classification and Compaction.pdfRagavanV2

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY

Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth

UNIT - IV - Air Compressors and its Performancesivaprakash250

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Double Revolving field theory-how the rotor develops torqueBhangaleSonal

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Thermal Engineering Unit - I & II . pptDineshKumar4165

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

Double rodded leveling 1 pdf activity 01KreezheaRecto

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Unleashing the Power of the SORA AI lastest leapRishantSharmaFr

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

Online banking management system project.pdfKamal Acharya

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

Recently uploaded (20)

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Unit 1 - Soil Classification and Compaction.pdf

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...

Call for Papers - International Journal of Intelligent Systems and Applicatio...

UNIT - IV - Air Compressors and its Performance

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Double Revolving field theory-how the rotor develops torque

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

Thermal Engineering Unit - I & II . ppt

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking

Double rodded leveling 1 pdf activity 01

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service

Unleashing the Power of the SORA AI lastest leap

AKTU Computer Networks notes --- Unit 3.pdf

Water Industry Process Automation & Control Monthly - April 2024

Online banking management system project.pdf

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Image description through fusion based recurrent multi model learning

1. IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI MODAL LEARNING Ram Manohar Oruganti1 , Shagan Sah2 , Suhas Pillai3 and Raymond Ptucha1 ABSTRACT Index Terms 1. INTRODUCTION Fig. 1.

2. 2. BACKGROUND 2.1 Convolutional Neural Networks 2.2 Long Short Term Memory Networks <x1, x2, xt 1, xt, , xT>, xt 1 xt xt it ft ot is gt ct, ht, it, ft, ot W b 3. PROPOSED LEARNING MODEL 3.1 FRMM model

3. Fig. 2. 3.2 FRMM variations 3.3 Image description through FRMMs image stage language stage fusion stage 4. EXPERIMENTAL RESULTS 4.1 Datasets

4. 4.2 Training details Caffe 4.3 Results Model B 1 B 2 B 3 B 4 AFRMM 70.2 52.8 38.3 27.6 Table I. CNN layer B 1 B 2 B 3 B 4 AFRMM+fc8 70.2 52.8 38.3 27.6 Table II. Model B 1 B 2 B 3 B 4 METEOR 40.4 Our model 70.2 52.8 27.6 22.5 Table III. Model B 1 B 2 B 3 B 4 METEOR Vinyals [13] 66.3 42.3 27.7 18.3 Table IV. 5. CONCLUSION 6. REFERENCES , et al. arXiv preprint arXiv:1409.0575,

5. 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012, December 3, 2012 December 6, 2012 Proceedings of the IEEE, 27th Annual Conference on Neural Information Processing Systems, NIPS 2013 Neural Computation, ICASSP 2013 Computer Vision and Pattern Recognition Computer Vision and Pattern Recognition , et al. Computer Vision and Pattern Recognition arXiv preprint arXiv:1505.00487, , et al. Proceedings of the IEEE International Conference on Computer Vision , et al. arXiv preprint arXiv:1502.03044, arXiv preprint arXiv:1411.4555, 21st Annual Conference on Neural Information Processing Systems, NIPS 2007 Advances in neural information processing systems arXiv preprint arXiv:1410.4615, Computer Vision and Pattern Recognition arXiv preprint arXiv:1412.4729, arXiv preprint arXiv:1412.6632, Transactions of the Association for Computational Linguistics, , et al. Computer Vision ECCV 2014 ICLR Proceedings of the 40th annual meeting on association for computational linguistics In Proceedings of the Ninth Workshop on Statistical Machine Translation , et al. arXiv preprint arXiv:1411.4389, , et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition arXiv preprint arXiv:1410.1090,

Image description through fusion based recurrent multi model learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Image description through fusion based recurrent multi model learning