This document provides an overview of workflow provenance and proposes a programming model and system architecture for collecting and querying workflow provenance data at scale. It begins by defining provenance and its importance for big data analytics. It then classifies different types of provenance queries and proposes a taxonomy. The document outlines a programming model using object-oriented programming and domain-specific languages to automate provenance logging. It proposes parsing logs into a graph database to support fundamental provenance queries and data visualization. Finally, it discusses scaling the system and conducting further research through user studies and query optimization.