Nubank is leading financial technology in Latin America with a 100% digital banking experience, being recognized as the fastest growing digital banking outside of Asia. Our business aims at fighting the complexity we see in Brazilian banking and empowering people towards their money once again. To successfully deliver an amazing experience for more than 5 million credit card customers and 2.5 million checking account customers, we created a software platform composed by more than a hundred microsservices that are fast and reliable, even when facing unpredictable failures. Everyday we accomplish this goal with Apache Kafka as our communication backbone. This talk will detail how we are able to successfully run our platform by applying different patterns and development techniques to create a consistent event-driven design, capable of correcting data processing failures as fast as our business needs to be. We’ll show how patterns like dead letter queue, circuit breakers and back off are applied into the architecture to ensure that failures can be handled as consistently and transparently as possible by engineers across the company . Finally, the talk will also show the set of tools that were created on this architecture to address the concerns about quick fixes of failed events, such as a home grown CLI capable of inspecting failed events and reprocessing them as needed, all built on top of Apache Kafka.