Human-to-human electronic communication has moved from text (email) to voice (VoIP) to augmented video (Zoom/Skype). Similarly, the medium for human-to-machine conversation has moved from text (chatbots) to voice, with voice-enabled chatbots in wide use today. The next step in this evolution is a video-enabled conversational experience. Each medium change brings its own technical challenges. Creating a good voice experience involves more than just hooking up a chatbot to a text-to-speech and speech-to-text service. Vocinity has developed a platform for voice-enabled chatbots that has been in production for almost 2 years. We're updating our platform to support a multimedia experience where the bot communicates via video, voice and text messages and images. Using Rasa to provide the conversational logic for the immersive multimedia bot enables us to meet the challenges in voice/video communication. Rasa’s power and flexibility enabled us to extend it to support voice and video. Presented by CTO of Vocinity, Nathan Stratton at the 2021 Rasa Summit https://rasa.com/summit/