SlideShare a Scribd company logo
Unleashing the power of Apache Atlas
with Apache Ranger
Virtual Data Connector Project
NIGEL JONES
JONESN@UK.IBM.COM
DATAWORKS, MUNICH, APRIL 2017
Apache®, Apache Atlas, Apache Ranger & other Apache project names referenced are either registered trademarks or trademarks of
the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation
is implied by the use of these marks.
About Me – Nigel Jones
 https://www.linkedin.com/in/nigelljones/
 jonesn@uk.ibm.com (Anyone still use email?)
 @planetf1 – noisy, f1, electric vehicles, food & drink …. A split of work/life
accounts didn’t work for me!
 And of course the Apache Atlas & Ranger mailing lists & JIRA!
 Science fan at school uni. It was cloud chambers back then… now just the
cloud 
 IBM Hursley, UK since 1990
 Last 3 years focus on Data Lake, Information Governance, Open Metadata
The Problem…..
WHY ARE WE HERE…..
Data?
 What data do I have?
 What does it mean?
 Where is it?
 Who has access to it?
 Who owns it?
 What quality is it?
 How does it relate to other data?
 How to I control, audit & understand access?
Regulatory needs
 Adhere to regulations like BCBS-239 and GDPR
 Need to know meaning, value of the data
 Demonstrate processes in place to govern access
 Audit
 Significant fines if rules breached
 Whilst ensuring easy, ready access to appropriate data for data professionals to
support an agile business
So what do we need to address this?
Metadata..
 Metadata enables data to be used outside of the application that created it.
 Analytics and decision making
 New business applications
 Reporting and compliance
 Metadata describes the format and content of data allowing people to judge which
dataset to use for a new project
 Structure
 Meaning
 Origin
 Valid values and quality
 Usage and ownership
 Regulations and classifications that apply
 Metadata describes the business context and classification of data allowing automated
governance processes to operate.
Which can support…
 An enterprise data catalogue that lists all data including where it is, what it
is, who owns it, it’s meaning, quality, where it came from , and can fully
describe it’s business context & how the data should be governed….
 Subject Matter experts searching, collaborating, feeding back about their
data needs and use
 Automated governance actions to protect and manage including auditing,
monitoring, quality control, rights management
But easily…
 Open frameworks & APIs
 Automatic collection & discovery of metadata in a dynamic heterogeneous
environment
 Using predefined standards for glossaries, schemas, rules, regulations to
reduce cost
 Cheap to integrate new tools
 No proprietary lock-in & assumptions that all tools are from one suite or
vendor
 Avoiding silos
 Distributed and Open
The vision
Open and
Unified Metadata
Virtualization Data Connector project
Data virtualization project
 Collaboration – IBM, several banks & open community
 A Data Lake environment
 Not just Hadoop, but other sources too
 Business Terms, Classifications, Metadata rich
 Offer virtualized views. Expose relational data with business terms
 Manage Access to resources – permit, deny, log, filter/mask …. THROUGH METADATA
 Open, pluggable
 Working through use cases, design, initial MVP (this year)
 Critique, feedback is welcomed. We’re looking for guidance and support from the Atlas
& Ranger communities as well as contribute our ideas
 Proposed changes all go through mailing list and JIRA for feedback
Apache Atlas
 “Atlas is a scalable and extensible set of core foundational governance
services – enabling enterprises to effectively and efficiently meet their
compliance requirements within Hadoop and allows integration with the
whole enterprise data ecosystem.” …. http://www.apache.org
 Open Community -- Apache Incubator since May 2015
 Type agnostic metadata store
 REST API & UI
 Supports many Hadoop components including HBase, Hive, Sqoop, Storm
& others
Apache Ranger
 Centralized security administration to manage all security related tasks in a
central UI or using REST APIs.
 Fine grained authorization to do a specific action and/or operation with
Hadoop component/tool and managed through a central administration
tool
 Standardize authorization method across all Hadoop components.
 Enhanced support for different authorization methods - Role based access
control, attribute based access control etc.
 Centralize auditing of user access and administrative actions (security
related) within all the components of Hadoop.
 … from http://ranger.apache.org
Project Interactions
Search/Rep
ort
GaianDB
• Search for list of assets by metadata
• Search for data
• Reporting tool obtains data to draw report
Underlying data, sql, hive,
HDFS, Oracle, Netezza
etc
Manages logical views
Deploys rules, pushes
classifications, source for
user roles (not users)
+ranger plugin to permit/deny, mask etc
Pulls rules. classifications
RDBMSHadoop
Apache
Atlas
Apache
Ranger
Apache
Solr
Why Atlas and Ranger?
 Open Source essential to forming an active ecosystem
 Vision, active community & evolving – ability to contribute & work with
others to provide the best solution
 Already have good core capabilities
 Atlas type system is very flexible
 Ranger offers a range of policy types and provides a pluggable framework
 Already cross project integration
 Use of tag based policie in Ranger sourced from Atlas
 Can be used independently of full Hadoop stack
Refined virtual connector scope scope
GaianDB
Ranger
Plugin
Titan
(GraphDB,
Metadata
Repository)
Ranger
Config
Ranger Server
Atlas
Poll Policies
OMAS
OMRS
IGC
Pre Post Create View
Metadata
Extract
physical
metadata
Manage
Logical
Tables
Virtualizer
Retrieve meta data
Retrieve meta data
Retrieve meta data
Push meta data
Oracle Netezza
Hive
Tables
Push and query meta data
Data Lake Repositories
Meta
Data
Data Lake Virtualization
tag-sync
rule-sync
Config (eg Policies,
Audit log location)
LDAP
Audit
Log
Mapper
Search for data/reporting
Push and
query
metadata
Meta
Data
Navigator
Meta
Data
Datameer
GaianDB & Virtualizer
 GaianDB
 Open Source
 Federated, self learning, dynamic configuration
 Based on Apache Derby
 Already had “policy” support – we’re plugging in
Ranger for this project
 Virtualizer
 Listens to event notifications on assets etc
 Creates view definitions in GaianDB, and new Atlas APIs
to store metadata. Could use different virtual engine..
 Designed to be open to other virtualization
technologies.
LT1 LT2
DS2DS1 DS3
PolicyPlugin
(ranger)
Virtualizer Atlas
GaianDB supports federation
– not used for MVP
Atlas – glossary enhancements
 Get Atlas closer to parity with commercial offerings
 Business Terms – categories, category hierarchies
 Has-a, is-a, type-of, synonym, antonym, arbitrary relationships
 Assets mapped to Business Terms
 Classifications
 Hierarchy
 Navigable mappings to retain ability to flatten tags to ranger
 Instead of hive column EMP_SALARY -> SPI, now can be EMP_SALARY -> SALARY ->
SPI …
 Used to drive governance
 ATLAS-1410
Atlas – other enhancements
 Consumer Centric APIs
 Open Metadata Access Services (OMAS)
 REST & more Kafka notifications
 Asset, Catalog, Connector, Glossary, Governance Action, Governance Definitions,
Information View, Roles and Access
 Repository level APIs
 Open Metadata Repository Services (OMRS)
 REST & more Kafka notifications
 Pluggability through an Open Connector Framework to other metadata repositories
– distributed and Open
 Standard data model/core
 Enhancement to core model – versioning, external linkage etc
 More standard types ie for all relational databases to ease sharing
Ranger areas being looked at
 Building a plugin for GaianDB
 Access control, simple masking. More later
 User synchronization (large #users, role of Atlas)
 Changes to tag sync process for New glossary proposal
 As more metadata goes into Atlas, it becomes source for generation of
some kinds of policies. Where is the master?
 Generating ranger rules from governance definitions
 How about control of access to Atlas itself?
 Aside: Interfaces used by enforcement engines (such as to get classification
data) need to be efficient – these should work for projects like Apache
Sentry as well as Atlas
Beyond the MVP
 Open Discovery Framework
 Consider other security enforcement engines – such as Apache Sentry &
driving more capability around rules & governance actions from Atlas
metadata
 Work on standard models to support different domains
 Lineage
 From high level design lineage through to operational detail. Logs vs graph….
 API metadata
 Infrastructure – JanusGraph…
 Abstraction added by IBM in last few months for titan 1
The vision
 An enterprise data catalog that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality
 Spanning systems both on premise and cloud providers
 Hosted locally to your data platforms but integrated to provide the enterprise view
 New data tools (from any vendor) connect to your data catalog out of the box
 No vendor lock-in; nor expensive population of yet another proprietary siloed metadata repository
 Metadata is added automatically to the catalog as new data is created
 Extensible discovery processes characterise and classify the data
 Interested parties and processes are notified
 Subject matter experts collaborating around the data
 Locate the data they need, quickly and efficiently
 Feed back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data
 Automated governance processes protect and manage your data
 Metadata-driven access control
 Auditing, metering and monitoring
 Quality control and exception management
 Rights management
 Predefined standards for glossaries, data schemas, rules and regulations that reduce the cost of doing business
 Open frameworks and APIs for collaborating with universities, traditional vendors and new innovators around data and advanced analytics
Summary
 Atlas can help us have an industry wide common metadata platform around
which a vibrant ecosystem can evolve
 Not only in Hadoop but more broadly
 Metadata driven governance can be scalable & enable us to manage our data
better, and be compliant with regulations
 The ideas presented here resonate with many people we’ve spoken to
 Get involved! I’d love to hear the feedback on this approach!
 Comment on the JIRAS, ask questions, contribute, disagree… ;-)
 Look at JIRA Tag “VirtualDataConnector” or start at ATLAS-1689
 Atlas wiki
 “Innovation happens best not in isolation but in collaboration” (keynote)
 THANKS!
Questions
After this talk
jonesn@uk.ibm.com
17:50 Room 4 – Security & Governance BOF
zzz
z
z
z
z
Questions?
Backup charts
Atlas
graphDB
“gaiandb”
IGC
IGC REST API
Oracle
Data
HDFS
Data
Netezza
Data
P-JDBC P-JDBCP-JDBC
GAF OMAS
Virtual
Asset
OMAS
Search
Search/Explore UI
Catalog
OMAS
OMRS
OMRS
GAF Pre
GAF Post
Connector Framework
*
Atlas boundaries
Developed in POC
May not be in POC initially
* May be hardcoded at first
C
o
n
n
e
c
t
o
r
F
r
a
m
e
w
o
r
k
ATLAS
Virtualizer
Architecture
Metadata areas and types
Policy Metadata (Principles,
Regulations, Standards, Approaches,
Rule Specifications, Roles and
Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Connector Directories
Access
Access
Information
Auditor
Integration
Developer
Business
Analyst
Data
Scientist
Information
Worker
Information
Owner
Information
Governor
Information
Steward
Data
Quality
Analyst
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Information
Curator
Teaming Metadata
(people profiles, communities,
projects,
notebooks, …)
Models and Schemas
3
2
4
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Infrastructure and systems
Rollout
1
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
User & Group/Role synchronization
UserSync2
LDAP holds role-membership
(LDAP groups) – could also be
Active Directory
ATLAS manages definitive
list of roles <that are used
for atlas managed sources>
• Corporate LDAP has a huge number of
users/groups
• Ranger currently needs to sync all
• In future perhaps we establish group/role
membership during authentication
• Capability for alternative source could be merged
in to base UserSync
LDAP lookup ->
group:member
Governance Action OMAS
- getRoles
Apache
Ranger
LDAP
Apache
Atlas
Atlas Glossary v2: Tag Sync to Ranger
TagSync2
ATLAS glossary manages a
sophisticated enterprise
glossary structure
• Atlas Glossary v2 Proposed in ATLAS-1410 (David Radley) Sync Builds on existing tagsync
approach
• New API in Atlas will flatten classification structure
• No changes to ranger – but exposing richer classification could be area of future work
Governance Action OMAS
Confidential
Salary
emp_renum
Business
Term
Hive Column
Business
Term
Confidential
emp_renum
Hive Column
Tag
Apache
Ranger
Apache
Atlas
Policy (Rule) synchronization
RuleSync
• Generate policies in Ranger based off entities in Atlas
• Currently designing how this works
• Scoped by policy service so existing Ranger UI approach still works
Governance Action OMAS
- getRules
Role
Classifications
Asset
Ranger Rule
Action
Apache
Ranger
Apache
Atlas
VirtualDataConnector JIRAS 20170402
 RANGER-1488
 RANGER-1487
 RANGER-1486
 RANGER-1485
 RANGER-1464
 RANGER-1454
 RANGER-1234
 RANGER-1186
 RANGER-1168
 ATLAS-1696
 ATLAS-1694
 ATLAS-1691
 ATLAS-1158
 ATLAS-520
 ATLAS-519
 ATLAS-455
 ATLAS-197
 Create Ranger plugin for gaiandb
 generate rules from Governance definitions in Atlas
 New usersync alternative for Atlas (vdc)
 Ranger support for Virtual Data Connector Project (ATLAS)
 Support Atlas v2 glossary in Atlas plugin (for access control to terms etc)
 Support of Atlas v2 glossary API proposal for tag source
 Post-evaluation phase user extensions
 Ranger Source: eclipse
 Add data masking for tag based policies
 Governance Action Framework OMAS
 Sample assets to support Virtual Connector Project
 OMAS Interfaces for Atlas
 Build ATLAS using Docker
 Temporal / Versioning support for types, traits, entites ....
 metrics
 Timeouts in tests should be configurable from system property
 Add build instructions in top level dir
References
 Apache Atlas - http://atlas.apache.org/
 Top level JIRA for this activity https://issues.apache.org/jira/browse/ATLAS-
1689
 Apache Ranger - http://ranger.apache.org/
 GaianDB
 https://github.com/gaiandb/gaiandb
 https://developer.ibm.com/open/openprojects/gaian-database/
 The case for open metadata – A.M.Chessell
 http://www.ibmbigdatahub.com/blog/case-open-metadata

More Related Content

What's hot

Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
DataWorks Summit/Hadoop Summit
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
DataWorks Summit/Hadoop Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
DataWorks Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
DataWorks Summit
 

What's hot (20)

Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
 

Similar to ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real Time

Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
Nigel Jones
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
DataWorks Summit
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
Digikrit
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
Chris Mattmann
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
DataWorks Summit
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
Governance Software Systems_ Managing and Governing Your Data Assets.pptxGovernance Software Systems_ Managing and Governing Your Data Assets.pptx
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
Mounika662749
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Azure_Purview.pdf
Azure_Purview.pdfAzure_Purview.pdf
Azure_Purview.pdf
hija7
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
Mohammed Chaaraoui
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
Alex Zeltov
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
Archonnex at ICPSR
Archonnex at ICPSRArchonnex at ICPSR
Archonnex at ICPSR
Harshakumar Ummerpillai
 
Clinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalClinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalManoj Vig
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
Amazon Web Services
 

Similar to ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real Time (20)

Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
Governance Software Systems_ Managing and Governing Your Data Assets.pptxGovernance Software Systems_ Managing and Governing Your Data Assets.pptx
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
SAIP
SAIPSAIP
SAIP
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Azure_Purview.pdf
Azure_Purview.pdfAzure_Purview.pdf
Azure_Purview.pdf
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Archonnex at ICPSR
Archonnex at ICPSRArchonnex at ICPSR
Archonnex at ICPSR
 
Clinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalClinical Trials & Big Data-Final
Clinical Trials & Big Data-Final
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real Time

  • 1. Unleashing the power of Apache Atlas with Apache Ranger Virtual Data Connector Project NIGEL JONES JONESN@UK.IBM.COM DATAWORKS, MUNICH, APRIL 2017 Apache®, Apache Atlas, Apache Ranger & other Apache project names referenced are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
  • 2. About Me – Nigel Jones  https://www.linkedin.com/in/nigelljones/  jonesn@uk.ibm.com (Anyone still use email?)  @planetf1 – noisy, f1, electric vehicles, food & drink …. A split of work/life accounts didn’t work for me!  And of course the Apache Atlas & Ranger mailing lists & JIRA!  Science fan at school uni. It was cloud chambers back then… now just the cloud   IBM Hursley, UK since 1990  Last 3 years focus on Data Lake, Information Governance, Open Metadata
  • 4. Data?  What data do I have?  What does it mean?  Where is it?  Who has access to it?  Who owns it?  What quality is it?  How does it relate to other data?  How to I control, audit & understand access?
  • 5. Regulatory needs  Adhere to regulations like BCBS-239 and GDPR  Need to know meaning, value of the data  Demonstrate processes in place to govern access  Audit  Significant fines if rules breached  Whilst ensuring easy, ready access to appropriate data for data professionals to support an agile business
  • 6. So what do we need to address this?
  • 7. Metadata..  Metadata enables data to be used outside of the application that created it.  Analytics and decision making  New business applications  Reporting and compliance  Metadata describes the format and content of data allowing people to judge which dataset to use for a new project  Structure  Meaning  Origin  Valid values and quality  Usage and ownership  Regulations and classifications that apply  Metadata describes the business context and classification of data allowing automated governance processes to operate.
  • 8. Which can support…  An enterprise data catalogue that lists all data including where it is, what it is, who owns it, it’s meaning, quality, where it came from , and can fully describe it’s business context & how the data should be governed….  Subject Matter experts searching, collaborating, feeding back about their data needs and use  Automated governance actions to protect and manage including auditing, monitoring, quality control, rights management
  • 9. But easily…  Open frameworks & APIs  Automatic collection & discovery of metadata in a dynamic heterogeneous environment  Using predefined standards for glossaries, schemas, rules, regulations to reduce cost  Cheap to integrate new tools  No proprietary lock-in & assumptions that all tools are from one suite or vendor  Avoiding silos  Distributed and Open
  • 12. Data virtualization project  Collaboration – IBM, several banks & open community  A Data Lake environment  Not just Hadoop, but other sources too  Business Terms, Classifications, Metadata rich  Offer virtualized views. Expose relational data with business terms  Manage Access to resources – permit, deny, log, filter/mask …. THROUGH METADATA  Open, pluggable  Working through use cases, design, initial MVP (this year)  Critique, feedback is welcomed. We’re looking for guidance and support from the Atlas & Ranger communities as well as contribute our ideas  Proposed changes all go through mailing list and JIRA for feedback
  • 13. Apache Atlas  “Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.” …. http://www.apache.org  Open Community -- Apache Incubator since May 2015  Type agnostic metadata store  REST API & UI  Supports many Hadoop components including HBase, Hive, Sqoop, Storm & others
  • 14. Apache Ranger  Centralized security administration to manage all security related tasks in a central UI or using REST APIs.  Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool  Standardize authorization method across all Hadoop components.  Enhanced support for different authorization methods - Role based access control, attribute based access control etc.  Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop.  … from http://ranger.apache.org
  • 15. Project Interactions Search/Rep ort GaianDB • Search for list of assets by metadata • Search for data • Reporting tool obtains data to draw report Underlying data, sql, hive, HDFS, Oracle, Netezza etc Manages logical views Deploys rules, pushes classifications, source for user roles (not users) +ranger plugin to permit/deny, mask etc Pulls rules. classifications RDBMSHadoop Apache Atlas Apache Ranger Apache Solr
  • 16. Why Atlas and Ranger?  Open Source essential to forming an active ecosystem  Vision, active community & evolving – ability to contribute & work with others to provide the best solution  Already have good core capabilities  Atlas type system is very flexible  Ranger offers a range of policy types and provides a pluggable framework  Already cross project integration  Use of tag based policie in Ranger sourced from Atlas  Can be used independently of full Hadoop stack
  • 17. Refined virtual connector scope scope GaianDB Ranger Plugin Titan (GraphDB, Metadata Repository) Ranger Config Ranger Server Atlas Poll Policies OMAS OMRS IGC Pre Post Create View Metadata Extract physical metadata Manage Logical Tables Virtualizer Retrieve meta data Retrieve meta data Retrieve meta data Push meta data Oracle Netezza Hive Tables Push and query meta data Data Lake Repositories Meta Data Data Lake Virtualization tag-sync rule-sync Config (eg Policies, Audit log location) LDAP Audit Log Mapper Search for data/reporting Push and query metadata Meta Data Navigator Meta Data Datameer
  • 18. GaianDB & Virtualizer  GaianDB  Open Source  Federated, self learning, dynamic configuration  Based on Apache Derby  Already had “policy” support – we’re plugging in Ranger for this project  Virtualizer  Listens to event notifications on assets etc  Creates view definitions in GaianDB, and new Atlas APIs to store metadata. Could use different virtual engine..  Designed to be open to other virtualization technologies. LT1 LT2 DS2DS1 DS3 PolicyPlugin (ranger) Virtualizer Atlas GaianDB supports federation – not used for MVP
  • 19. Atlas – glossary enhancements  Get Atlas closer to parity with commercial offerings  Business Terms – categories, category hierarchies  Has-a, is-a, type-of, synonym, antonym, arbitrary relationships  Assets mapped to Business Terms  Classifications  Hierarchy  Navigable mappings to retain ability to flatten tags to ranger  Instead of hive column EMP_SALARY -> SPI, now can be EMP_SALARY -> SALARY -> SPI …  Used to drive governance  ATLAS-1410
  • 20. Atlas – other enhancements  Consumer Centric APIs  Open Metadata Access Services (OMAS)  REST & more Kafka notifications  Asset, Catalog, Connector, Glossary, Governance Action, Governance Definitions, Information View, Roles and Access  Repository level APIs  Open Metadata Repository Services (OMRS)  REST & more Kafka notifications  Pluggability through an Open Connector Framework to other metadata repositories – distributed and Open  Standard data model/core  Enhancement to core model – versioning, external linkage etc  More standard types ie for all relational databases to ease sharing
  • 21. Ranger areas being looked at  Building a plugin for GaianDB  Access control, simple masking. More later  User synchronization (large #users, role of Atlas)  Changes to tag sync process for New glossary proposal  As more metadata goes into Atlas, it becomes source for generation of some kinds of policies. Where is the master?  Generating ranger rules from governance definitions  How about control of access to Atlas itself?  Aside: Interfaces used by enforcement engines (such as to get classification data) need to be efficient – these should work for projects like Apache Sentry as well as Atlas
  • 22. Beyond the MVP  Open Discovery Framework  Consider other security enforcement engines – such as Apache Sentry & driving more capability around rules & governance actions from Atlas metadata  Work on standard models to support different domains  Lineage  From high level design lineage through to operational detail. Logs vs graph….  API metadata  Infrastructure – JanusGraph…  Abstraction added by IBM in last few months for titan 1
  • 23. The vision  An enterprise data catalog that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality  Spanning systems both on premise and cloud providers  Hosted locally to your data platforms but integrated to provide the enterprise view  New data tools (from any vendor) connect to your data catalog out of the box  No vendor lock-in; nor expensive population of yet another proprietary siloed metadata repository  Metadata is added automatically to the catalog as new data is created  Extensible discovery processes characterise and classify the data  Interested parties and processes are notified  Subject matter experts collaborating around the data  Locate the data they need, quickly and efficiently  Feed back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data  Automated governance processes protect and manage your data  Metadata-driven access control  Auditing, metering and monitoring  Quality control and exception management  Rights management  Predefined standards for glossaries, data schemas, rules and regulations that reduce the cost of doing business  Open frameworks and APIs for collaborating with universities, traditional vendors and new innovators around data and advanced analytics
  • 24. Summary  Atlas can help us have an industry wide common metadata platform around which a vibrant ecosystem can evolve  Not only in Hadoop but more broadly  Metadata driven governance can be scalable & enable us to manage our data better, and be compliant with regulations  The ideas presented here resonate with many people we’ve spoken to  Get involved! I’d love to hear the feedback on this approach!  Comment on the JIRAS, ask questions, contribute, disagree… ;-)  Look at JIRA Tag “VirtualDataConnector” or start at ATLAS-1689  Atlas wiki  “Innovation happens best not in isolation but in collaboration” (keynote)  THANKS!
  • 25. Questions After this talk jonesn@uk.ibm.com 17:50 Room 4 – Security & Governance BOF zzz z z z z Questions?
  • 27. Atlas graphDB “gaiandb” IGC IGC REST API Oracle Data HDFS Data Netezza Data P-JDBC P-JDBCP-JDBC GAF OMAS Virtual Asset OMAS Search Search/Explore UI Catalog OMAS OMRS OMRS GAF Pre GAF Post Connector Framework * Atlas boundaries Developed in POC May not be in POC initially * May be hardcoded at first C o n n e c t o r F r a m e w o r k ATLAS Virtualizer Architecture
  • 28. Metadata areas and types Policy Metadata (Principles, Regulations, Standards, Approaches, Rule Specifications, Roles and Metrics) Governance Actions and Processes Augmentation MappingImplementation Connector Directories Access Access Information Auditor Integration Developer Business Analyst Data Scientist Information Worker Information Owner Information Governor Information Steward Data Quality Analyst Business Objects and Relationships, Taxonomies and Ontologies Business Attributes Organization Information Curator Teaming Metadata (people profiles, communities, projects, notebooks, …) Models and Schemas 3 2 4 5 Physical Asset Descriptions (Data stores, APIs, models and components) Asset Collections (Sets, Typed Sets, Type Organized Sets) Information Views Rights Management Reference Data Feedback Metadata (tags, comments, ratings, …) ClassificationSchemes Classification Strategy Subject Area Definition Campaigns and Projects Infrastructure and systems Rollout 1 Discovery Metadata (profile data, technical classification, data classification, data quality assessment, …) Augmentation Instrument Association Information Process Instrumentation (design lineage) 6 7
  • 29. User & Group/Role synchronization UserSync2 LDAP holds role-membership (LDAP groups) – could also be Active Directory ATLAS manages definitive list of roles <that are used for atlas managed sources> • Corporate LDAP has a huge number of users/groups • Ranger currently needs to sync all • In future perhaps we establish group/role membership during authentication • Capability for alternative source could be merged in to base UserSync LDAP lookup -> group:member Governance Action OMAS - getRoles Apache Ranger LDAP Apache Atlas
  • 30. Atlas Glossary v2: Tag Sync to Ranger TagSync2 ATLAS glossary manages a sophisticated enterprise glossary structure • Atlas Glossary v2 Proposed in ATLAS-1410 (David Radley) Sync Builds on existing tagsync approach • New API in Atlas will flatten classification structure • No changes to ranger – but exposing richer classification could be area of future work Governance Action OMAS Confidential Salary emp_renum Business Term Hive Column Business Term Confidential emp_renum Hive Column Tag Apache Ranger Apache Atlas
  • 31. Policy (Rule) synchronization RuleSync • Generate policies in Ranger based off entities in Atlas • Currently designing how this works • Scoped by policy service so existing Ranger UI approach still works Governance Action OMAS - getRules Role Classifications Asset Ranger Rule Action Apache Ranger Apache Atlas
  • 32. VirtualDataConnector JIRAS 20170402  RANGER-1488  RANGER-1487  RANGER-1486  RANGER-1485  RANGER-1464  RANGER-1454  RANGER-1234  RANGER-1186  RANGER-1168  ATLAS-1696  ATLAS-1694  ATLAS-1691  ATLAS-1158  ATLAS-520  ATLAS-519  ATLAS-455  ATLAS-197  Create Ranger plugin for gaiandb  generate rules from Governance definitions in Atlas  New usersync alternative for Atlas (vdc)  Ranger support for Virtual Data Connector Project (ATLAS)  Support Atlas v2 glossary in Atlas plugin (for access control to terms etc)  Support of Atlas v2 glossary API proposal for tag source  Post-evaluation phase user extensions  Ranger Source: eclipse  Add data masking for tag based policies  Governance Action Framework OMAS  Sample assets to support Virtual Connector Project  OMAS Interfaces for Atlas  Build ATLAS using Docker  Temporal / Versioning support for types, traits, entites ....  metrics  Timeouts in tests should be configurable from system property  Add build instructions in top level dir
  • 33. References  Apache Atlas - http://atlas.apache.org/  Top level JIRA for this activity https://issues.apache.org/jira/browse/ATLAS- 1689  Apache Ranger - http://ranger.apache.org/  GaianDB  https://github.com/gaiandb/gaiandb  https://developer.ibm.com/open/openprojects/gaian-database/  The case for open metadata – A.M.Chessell  http://www.ibmbigdatahub.com/blog/case-open-metadata

Editor's Notes

  1. This is the nirvana. Many tools from different teams – open or proprietary – all able to exchange metadata easily. A new tool can easily understand existing metadata, can integrate with minimal effort
  2. GaianDB is a open source project from IBM that is based on Apache Derby and supports a highly distributed model with self learning/healing capabilities. It virtualizes access to underlying data sources – for example a virtual table my be surfaced via JDBC that is actually based on a combination of a CSV file and another relational database. We are using it in the Virtual Data Connector project to provide a single point of control via a ranger plugin, as well as to do some data source mappings such as hiding technical columns from view or renaming columns with more business like terms gleaned from the glossary (Atlas)
  3. This is broadly the scope of an MVP definition we’re using to focus our initial work this year. We have use cases we can share with anyone interested, and will be capturing that info in the Atlas/Ranger JIRAs and potentially wiki. The list of metadata repositories is an example. Our MVP sources some metadata from IGC since that is being used by some participants, but the focus is on open interfaces and Atlas. The other repositories are potential ideas only. Similarly
  4. It’s important to architect this in an open way. The rules used to decide when to virtualize a resource need to be pluggable – perhaps for example all data arriving in a partular DataLake zone will be a virtualization candidate. Further the actual technology needs to be changeable – proprietary or other open projects – for example perhaps Presto is a candidate . Ideas welcome – proposals will be shared in Jira
  5. OMAS = Open Metadata Access Services – These are consumer centric interfaces so would pass objects suited to a particular consumer – for example Ranger in the case of the Governance Action OMAS, or a catalog UI perhaps for the Catalog services. Each consumer has different needs in terms of object structure, or whether it deals with individual objects or sets, and this can differ from the model used in the underlaying repository. For some interfaces this mapping Is simple, for others more complex. OMRS = Open Metadata Repository Services. This refers to the core repository, ie the Atlas type system. We see other metadata repositories adopting the same UI, and are proposing a mechanism that will allow these to be plugged in. Metadata can change rapidly, and the only scalable approach is to ensure it’s open & distributed. Contributions from other metadata server authors welcome ! Note that these are our working names. Fundamentally they are Atlas, and so the Atlas community will together need to agree on the actual names moving forward. A standard data model – in addition to a common mechanism/server – is necessary to make it much easier to *understand* the metadata we store. Whilst there will always be a need for extensions, having a good base object definition will make application integration easier. For example we might all wish to describe a RDBMS in a very similar way. We can then go on from this to have more standards oriented around industry models
  6. GaianDB ranger plugin – GaianDB already has the capability to have a policy plugin which governs access to it’s virtual tables. To integrate with Ranger and Atlas we will have a ranger style plugin. Whilst this will function like any other ranger plugin, in addition policies will be generated from Atlas itself. User synchronization challenges are described later. In summary, in an enterprise environment there may be many users (100k+) in LDAP, but only a small number have access to the virtualization infrastructure. We’re going to key the user sync off the list of user roles found in Atlas itself, and then obtain the role membership from LDAP. This will then be uploaded to ranger as per existing usersync Tag Sync – the glossary enhancements provide additional structure in how Business Terms, Classifications & assets are linked. A new Atlas API will flatten this structure and thus preserve Ranger’s ability to use atlas tags as today. In future there may be a re-evaluation to see if this more sophisticated approach should be pushed to Ranger too Policies – Since Atlas now has richer metadata including information about asset ownership, high level governance policies, data classification, rules may be generated in some cases from Atlas, or from a new rule-sync process. This is still being worked through and we’ll share our ideas with the community Openness – Some users I’ve spoken to are interested in Atlas but may currently use other technologies for enforcement, including Sentry in Hadoop. The intent is to ensure all the interfaces defined are open to all, and useful… so that should someone wish they could just as well integrate with Sentry as Ranger. This loosely coupled approach helps support an innovative exciting ecosystem
  7. Atlas holds metadata round user roles – they are used to define governance rules… In keeping with Ranger’s process to synchronize users & groups we will source these slightly differently, though this is mostly simply a scoping exercise to avoid pulling everything from LDAP. One consideration for the future is whether Ranger needs to sync users/groups at all – whilst the sync can help with typeahead when manually defining policies, it’s of relatively little use at runtime if instead plugins could pull the current user role membership from ldap or elsewhere after connection. Possibly for another JIRA
  8. Ranger already does tag synchronization with Atlas, but changes will be needed to support the new glossary capabilities A new tagsync process will likely be implemented so that either old or new can be used to avoid any breakage for existing users
  9. Currently working through how this may work, but fundamentally we can define the governance rules in Atlas, and likely generate executable rules in ranger. Refer to the JIRAs for ongoing design on this area
  10. In no particular order and an example only – a query for all JIRAS against Atlas, Ranger with label = ‘VirtualDataConnector’ as of 2 April 2017. This is a list of issues we’re interested in, in particular. The root JIRA for our current design work is ATLAS-1689 which it appears we forgot to tag  . There are others too so please rerun the query!