Haithem Afli presented on crossing boundaries between natural language processing and computer vision. He discussed how the fields have combined to generate captions for videos by using computer vision techniques to extract concepts from videos and neural networks to generate descriptive captions. He participated in the TRECVID 2016 challenge on automatic video captioning, where his team achieved good results by generating captions for individual frames and ranking captions based on visual concept similarity. Going forward, he plans to refine the captioning methods and explore broader applications like multimodal sentiment analysis.