This document discusses building reliable systems from unreliable parts. It recommends planning for failures, recognizing that all systems fail, using stateless and automatically repairing services, and emphasizing a culture of reliability even in open source projects. The document is authored by Jonah Horowitz, a site reliability engineer at Netflix and elsewhere.