This document discusses operationalizing a data lake for advanced analytics. It describes how a data lake platform can provide end-to-end capabilities for managing the data supply chain from source to consumer. This includes batch and streaming ingestion, automatic discovery, data quality, security, lifecycle management, metadata cataloging, transformations, and self-service data preparation. A reference architecture is presented showing how raw and refined data zones can feed a trusted zone to provide role-based access to business analysts, researchers, and data scientists. Machine learning techniques are also described for integrating data silos, matching records, and classifying sensitive data.