The talk will discuss recent advances in vision-to-language problems like image captioning and visual question answering. Image captioning requires machines to describe images in human-readable sentences, while visual question answering asks machines to answer language-based questions based on visual information. The speaker will outline theories and techniques for these tasks and provide a live demo of image captioning. The speaker, Dr. Qi Wu, is a senior research associate who has published papers on these topics, with one model producing the best results on an image captioning challenge.