SlideShare for iOS
by Linkedin Corporation
FREE - On the App Store
adoop has enabled a new scale of data processing that is paving the way for data driven businesses. However, business data is often riddled with compliance and regulatory requirements that can be......
adoop has enabled a new scale of data processing that is paving the way for data driven businesses. However, business data is often riddled with compliance and regulatory requirements that can be easily lost as data is manipulated, transformed, and re-written within the Hadoop eco-system. Furthermore, enterprise data is often scattered across a wide array of systems, each with their own techniques for policy management. As data from these disparate systems is loaded into Hadoop, all of the carefully crafted policy is immediately lost, creating a potential risk for the business. Data provenance is widely recognized as a technique for applying policy in more traditional industries such as storage, databases and high-performance computing. By tracking data from its origin and across various transformations and computations, provenance tracking systems can answer questions such as: Who has seen a given piece of data? Where did this data come from? What policies existed on this data? In this talk, we will discuss traditional data management solutions, the challenges of bringing them to an eco-system like Hadoop, and approaches to enable data management in the growing Big Data world.