This document discusses building self-healing distributed systems. It begins with introducing the speakers and outlining the agenda, which covers both theory and practical examples of self-healing systems. In the theory section, it defines what makes a system distributed, challenges they face, and how enabling systems to heal themselves can provide benefits like improved uptime but also risks that must be mitigated. The methodology of identifying problems, designing automated solutions, executing manually, then automating and adjusting is presented. In the practice section, three examples are demonstrated: triggering a full garbage collection when JVM heap usage exceeds a threshold, automatically restarting an application server when beacon uploads fail, and opening a support case when customer file uploads break.