Enterprises have dealt with data governance over the years, but it has been mostly around master data. With the advent of IoT/web/app streams everywhere in the ecosystem surrounding an enterprise, data-in-motion has become a strong force to reckon. Data-in-motion passes through several levels of transformations and augmentation before it becomes data-at-rest. Through this, it is pertinent to preserve the sanctity of such data or at least track the provenance through the various changes. This is very important for a lot of verticals where there are strong regulatory and compliance laws that exist around "who changed what."
This session will go into detail around some specific use cases of how data gets changed, how it can be tracked seamlessly and why this is important for certain verticals. This will be presented in two parts. The first part will cover the industry angle to this and its importance weighed in by several regulatory bodies. The second part will address the technology aspect of it and discuss how companies can leverage Apache Atlas and Ranger in conjunction with NiFi and Kafka to embrace data governance and provenance of their data streams.
Speakers
Dinesh Chandrasekhar, Director, Hortonworks
Paige Bartley, Senior Analyst - Data and Enterprise Intelligence, Ovum
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
Who changed my data? Need for data governance and provenance in a streaming world
1. Who changed my data?
Need for data governance and
provenance in a streaming world
Digital capability requires granular control of all data assets.
Dinesh Chandrasekhar
Director, Product Marketing
Paige Bartley
Senior Analyst, Data and
Enterprise Intelligence
Let’s step away from compliance, regulation, and requirements, and look at the major trends and drivers within the enterprise. Governance and provenance are often discussed as “checkbox” requirements, rather than as enablers.
ICT Enterprise Insights survey identified “create digital capability” and “manage security, identity, and privacy” as the top two IT trends in the enterprise. What do these trends have in common?
There are three pillars to creating digital capability. The first pillar is the creation of the digital platform and infrastructure itself. The second pillar is the creation of the ability to effectively exploit and utilize data. The third pillar is the development of the enterprise's innovation process and methodology for the digital age. All three are underpinned by a clearly articulated digital strategy.
Article 4: Any information relating to an identified or identifiable natural person; a natural person can be identified indirectly or directly , and the enterprise needs to be cautious with combining data sources to ensure that innocuous information doesn’t become personal information
Article 9: Processing of biometric data for the purpose of uniquely identifying a person is inherently prohibited, unless certain conditions are met, and this applies to several types of data in motion: sensor data from wearables, medical devices, and fitness devices.
Article 30: Must document purposes of processing, transfers of data to non-EU countries, and the envisaged time limits for erasure of the data
Data policies are applied and encoded at the metadata level. Metadata, or data about data, is critical to providing a common foundation for understanding the qualities of data residing in different systems and to provide lineage and cataloging capabilities. A shared or common metadata framework, where all metadata is managed together, allows data to be centrally searched, tracked, and monitored regardless of its "home" repository.
To make this a reality, the same governance standards need to be applied to all enterprise data equally. There needs to be a single platform environment where data-in-motion and data-at-rest can be managed together, with a common metadata framework. All data-in-motion sources need a way to be ingested into this platform, with provenance and lineage tracked as they flow in.
TALK TRACK
Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications.
[NEXT SLIDE]
Data is often referred to as the fuel of today’s businesses. In reality, every business has data and perhaps can have access to the same types of data than most of their competitors. The real paradigm is not data but who uses it smarter with greater effect. And that usage often rely on connecting the data dots across your organization. By connecting customers to products to channels through which they interact of prefer to interact we can drive better customer experiences – resulting in better loyalty and hopefully better revenues. Every industry is being transformed through these connected use cases.
1) Data is in multiple places (data centers that the company owns, cloud, owned by a third party,). 2) Different data in different places (data in your databases – numbers – data from sensors in a connected product not arranged in a database; 3) data flowing back and forth between data center and cloud.
Talking points:
There is a an entire new world being created by combining lots of data with break through tools.
Data could be on-premises and in the cloud
Data is moving from sensors in real time across our data fabric and giving us precise instrumentation of what happened just before an event as well as after the event. This is true for customers buying on the web as well as products that might fail.
We can run our machine learning and deep learning on these vast repositories of data
And we can push these models down to the edges to automate decision
Note:
For us as a community and as a company, we need to continue to innovate around the core technology, while thinking about how we enable 3 personas to be successful. This is the logical evolution and transformation that’s happening now.
You need to holistically manage all the data in all places, then begin to move our platform into place
You need to holistically manage all the data in all places, then begin to move our platform into place
You need to holistically manage all the data in all places, then begin to move our platform into place
HDF provides very fine-grained, high fidelity reporting about the origins of data, how it was used, who used it etc.