"The rise of large language models like GPT-4 and generative AI has changed the traditional approach of data and ML teams to engineer their own features and train models in batch on large sets of historic data. Since training of proprietary LLMs is not anymore an affordable option to most organizations, the inference of foundational models via APIs seems to be the natural way of consumption, providing new challenges for the architecture of real-time apps and workflows. In this talk we show how data and machine learning teams can rapidly prototype and deploy real-time ML apps, ingesting real-time data with the help of Apache Kafka® and Airy, an open-source app framework. We will discuss different options to finetune LLMs and „chaining“ them with other ML models at inference in a microservices architecture utilizing Kafka Streams and Kubernetes. We will also discuss how streaming can enable dynamic features for ML models and prompt engineering to integrate with generative AI. At the end of the talk we will give an outlook on the opportunity to dynamically retrain machine learning models in real-time with streaming and batch sources, utilizing Ray and Kubernetes to spin up GPU node pools for model training on demand. In this context, we will also discuss how event streaming can be used for reinforcement learning with human feedback (RLHF) to improve the accuracy of predictions and to make the ML model more robust over time."