2. What is Modern Data?
Clickstream, web and social, geo-location, IoT,
server logs, etc are considered modern.
(think schema-on-read)
ERP, CRM, SCM and LOB-specific OLTP are
considered traditional.
(think schema-on-write)
Mainframe is considered legacy.
(think mission-critical)
3. What is Modern Data?
Modern Data refers to stream processing
In a streaming data model, you store queries and then continuously
run data through the queries.
(think event-driven model)
Both Modern and traditional data refer to Batch Processing
In a traditional query model, you store data and then run queries on
the data as needed.
(think query-driven model)
4. What is Modern Data?
Modern Data refers to data; not to technologies.
It it the responsibility of those of us who architect, develop and
implement data technologies to appreciate this difference.
There have been many hard-won lessons learned in enterpise
data management.
The criticality of Data Governance may well top this list.
5. What is Data Governance?
The process by which an organization formalizes
the fiduciary duty for the management of data
assets critical to its success.
Forrester
Data governance is a system of decision rights
and accountabilities for information-related
processes, executed according to agreed upon
models, which describe who can take what
actions with what information, and when, under
what circumumstances, using what methods.
Data Governance Institute
10. Atlas Proposal
Background
Hadoop is one of many platforms in the modern enterprise data ecosystem and
requires governance controls commensurate with this reality.
Currently, there is no easy or complete way to provide comprehensive visibility
and control into Hadoop audit, lineage, and security for workflows that require
Hadoop and non-Hadoop processing.
Many solutions are usually point based, and require a monolithic application
workflow. Multi-tenancy and concurrency are problematic as these offerings are
not aware of activity outside of their narrow focus.
As Hadoop gains greater popularity, governance concerns will become
increasingly vital to increasing maturity and furthering adoption. It is a particular
barrier to expanding enterprise data under management.
11. Atlas Proposal
Apache Atlas allows agnostic governance visibility into Hadoop, these
abilities are enabled through a set of core foundational services powered
by a flexible metadata repository.
These services include:
Search and Lineage for datasets
Metadata driven data access control
Indexed and Searchable Centralized Auditing operational Events
Data lifecycle management – ingestion to disposition
Metadata interchange with other metadata tools