Submit Search
Upload
Unleashing the Power of Apache Atlas with Apache Ranger
•
1 like
•
6,788 views
DataWorks Summit/Hadoop Summit
Follow
Unleashing the Power of Apache Atlas with Apache Ranger Slides
Read less
Read more
Technology
Report
Share
Report
Share
1 of 33
Download now
Download to read offline
Recommended
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
Gluster.community.day.2013
Gluster.community.day.2013
Udo Seidel
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Recommended
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
Gluster.community.day.2013
Gluster.community.day.2013
Udo Seidel
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
HDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Bart Vandewoestyne
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Hadoop
Hadoop
Nishant Gandhi
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
VMUG IT
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Exist
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Graph-TA
Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
Recommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
Yu-ching Lin
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
Scaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
More Related Content
Viewers also liked
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
HDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Bart Vandewoestyne
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Hadoop
Hadoop
Nishant Gandhi
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Viewers also liked
(12)
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
Hadoop introduction
Hadoop introduction
HDFS Design Principles
HDFS Design Principles
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Hadoop
Hadoop
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Similar to Unleashing the Power of Apache Atlas with Apache Ranger
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
VMUG IT
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Exist
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Graph-TA
Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
Recommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
Yu-ching Lin
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
Scaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, Collaboration
Inside Analysis
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
Dan Delany
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Amazon Web Services
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
seoul_engineer
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
Amazon Web Services
Similar to Unleashing the Power of Apache Atlas with Apache Ranger
(16)
Impala use case @ Zoosk
Impala use case @ Zoosk
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Just the basics_strata_2013
Just the basics_strata_2013
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Recommender Systems at Scale
Recommender Systems at Scale
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Scaling PyData Up and Out
Scaling PyData Up and Out
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, Collaboration
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
Recently uploaded
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
jimielynbastida
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
BookNet Canada
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
ngoud9212
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
The Digital Insurer
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
Neo4j
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
null - The Open Security Community
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Recently uploaded
(20)
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Unleashing the Power of Apache Atlas with Apache Ranger
1.
UnleashingthepowerofApacheAtlaswith ApacheRanger VirtualDataConnectorProject NIGELJONES JONESN@UK.IBM.COM DATAWORKS,MUNICH,APRIL2017 Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof thesemarks.
2.
AboutMe–NigelJones •https://www.linkedin.com/in/nigelljones/ •jonesn@uk.ibm.com(Anyonestilluseemail?) •@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life accountsdidn’tworkforme! •AndofcoursetheApacheAtlas&Rangermailinglists&JIRA! •Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud J •IBMHursley,UKsince1990 •Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata
3.
TheProblem….. WHYAREWEHERE…..
4.
Data? •WhatdatadoIhave? •Whatdoesitmean? •Whereisit? •Whohasaccesstoit? •Whoownsit? •Whatqualityisit? •Howdoesitrelatetootherdata? •HowtoIcontrol,audit&understandaccess?
5.
Regulatoryneeds •AdheretoregulationslikeBCBS-239andGDPR •Needtoknowmeaning,valueofthedata •Demonstrateprocessesinplacetogovernaccess •Audit •Significantfinesifrulesbreached •Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport anagilebusiness
6.
Sowhatdoweneedtoaddressthis?
7.
Metadata.. •Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit. •Analyticsanddecisionmaking •Newbusinessapplications •Reportingandcompliance •Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich datasettouseforanewproject •Structure •Meaning •Origin •Validvaluesandquality •Usageandownership •Regulationsandclassificationsthatapply
8.
Whichcansupport… •Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s businesscontext&howthedatashouldbegoverned…. •SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata needsanduse •Automatedgovernanceactionstoprotectandmanageincludingauditing, monitoring,qualitycontrol,rightsmanagement
9.
Buteasily… •Openframeworks&APIs •Automaticcollection&discoveryofmetadatainadynamicheterogeneous environment •Usingpredefinedstandardsforglossaries,schemas,rules,regulationstoreduce cost •Cheaptointegratenewtools •Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor •Avoidingsilos •DistributedandOpen
10.
Thevision Open and Unified Metadata
11.
VirtualizationDataConnectorproject
12.
Datavirtualizationproject •Collaboration–IBM,severalbanks&opencommunity •ADataLakeenvironment •NotjustHadoop,butothersourcestoo •BusinessTerms,Classifications,Metadatarich •Offervirtualizedviews.Exposerelationaldatawithbusinessterms •ManageAccesstoresources–permit,deny,log,filter/mask….THROUGH METADATA •Open,pluggable •Workingthroughusecases,design,initialMVP(thisyear) •Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe Atlas&Rangercommunitiesaswellascontributeourideas •ProposedchangesallgothroughmailinglistandJIRAforfeedback
13.
ApacheAtlas •“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices– enablingenterprisestoeffectivelyandefficientlymeettheircompliance requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata ecosystem.”….http://www.apache.org •OpenCommunity--ApacheIncubatorsinceMay2015 •Typeagnosticmetadatastore •RESTAPI&UI •SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm& others
14.
ApacheRanger •Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral UIorusingRESTAPIs. •Finegrainedauthorizationtodoaspecificactionand/oroperationwithHadoop component/toolandmanagedthroughacentraladministrationtool •StandardizeauthorizationmethodacrossallHadoopcomponents. •Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol, attributebasedaccesscontroletc. •Centralizeauditingofuseraccessandadministrativeactions(securityrelated) withinallthecomponentsofHadoop. •…fromhttp://ranger.apache.org
15.
ProjectInteractions Search/Report GaianDB •Searchforlistofassetsbymetadata •Searchfordata •Reportingtoolobtainsdatatodrawreport Underlyingdata,sql,hive, HDFS,Oracle,Netezzaetc Manageslogicalviews Deploysrules,pushes classifications,sourcefor userroles(notusers) +rangerplugintopermit/deny,masketc Pullsrules.classifications RDBMSHadoop ApacheAtlas Apache Ranger ApacheSolr
16.
WhyAtlasandRanger? •OpenSourceessentialtoforminganactiveecosystem •Vision,activecommunity&evolving–abilitytocontribute&workwithothersto providethebestsolution •Alreadyhavegoodcorecapabilities •Atlastypesystemisveryflexible •Rangeroffersarangeofpolicytypesandprovidesapluggableframework •Alreadycrossprojectintegration •UseoftagbasedpolicieinRangersourcedfromAtlas •CanbeusedindependentlyoffullHadoopstack
17.
Refinedvirtualconnectorscopescope GaianDB Ranger Plugin Titan (GraphDB, Metadata Repository) Ranger Config RangerServer Atlas PollPolicies OMAS OMRS IGC PrePostCreate View Metadata Extract physical metadata Manage Logical Tables Virtualizer Retrievemetadata Retrievemetadata Retrievemetadata Pushmetadata OracleNetezza Hive Tables Pushandquerymetadata DataLakeRepositories Meta Data DataLakeVirtualization tag-sync rule-sync Config (eg Policies, Audit log locaMon) LDAP Audit Log Mapper Searchfordata/reporting Pushandquery metadata Meta Data Navigator Meta Data Datameer
18.
GaianDB&Virtualizer •GaianDB •OpenSource •Federated,selflearning,dynamicconfiguration •BasedonApacheDerby •Alreadyhad“policy”support–we’replugginginRangerfor thisproject •Virtualizer •Listenstoeventnotificationsonassetsetc •CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto storemetadata.Couldusedifferentvirtualengine.. •Designedtobeopentoothervirtualizationtechnologies. LT1LT2 DS2DS1DS3 Policy Plugin (ranger) VirtualizerAtlas GaianDBsupportsfederation –notusedforMVP
19.
Atlas–glossaryenhancements •GetAtlasclosertoparitywithcommercialofferings •BusinessTerms–categories,categoryhierarchies •Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships •AssetsmappedtoBusinessTerms •Classifications •Hierarchy •Navigablemappingstoretainabilitytoflattentagstoranger •InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY-> SPI… •Usedtodrivegovernance •ATLAS-1410
20.
Atlas–otherenhancements •ConsumerCentricAPIs •OpenMetadataAccessServices(OMAS) •REST&moreKafkanotifications •Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions, InformationView,RolesandAccess •RepositorylevelAPIs •OpenMetadataRepositoryServices(OMRS) •REST&moreKafkanotifications •PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories– distributedandOpen •Standarddatamodel/core •Enhancementtocoremodel–versioning,externallinkageetc •Morestandardtypesieforallrelationaldatabasestoeasesharing
21.
Rangerareasbeinglookedat •BuildingapluginforGaianDB •Accesscontrol,simplemasking.Morelater •Usersynchronization(large#users,roleofAtlas) •ChangestotagsyncprocessforNewglossaryproposal •AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds ofpolicies.Whereisthemaster? •Generatingrangerrulesfromgovernancedefinitions •HowaboutcontrolofaccesstoAtlasitself? •Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata) needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas Atlas
22.
BeyondtheMVP •OpenDiscoveryFramework •Considerothersecurityenforcementengines–suchasApacheSentry&driving morecapabilityaroundrules&governanceactionsfromAtlasmetadata •Workonstandardmodelstosupportdifferentdomains •Lineage •Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph…. •APImetadata •Infrastructure–JanusGraph… •AbstractionaddedbyIBMinlastfewmonthsfortitan1
23.
Thevision •Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage), owner,structure,meaning,classificationandquality •Spanningsystemsbothonpremiseandcloudproviders •Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview •Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox •Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository •Metadataisaddedautomaticallytothecatalogasnewdataiscreated •Extensiblediscoveryprocessescharacteriseandclassifythedata •Interestedpartiesandprocessesarenotified •Subjectmatterexpertscollaboratingaroundthedata •Locatethedatatheyneed,quicklyandefficiently •Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand supporteconomicevaluationofdata •Automatedgovernanceprocessesprotectandmanageyourdata •Metadata-drivenaccesscontrol
24.
Summary •Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha vibrantecosystemcanevolve •NotonlyinHadoopbutmorebroadly •Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter, andbecompliantwithregulations •Theideaspresentedhereresonatewithmanypeoplewe’vespokento •Getinvolved!I’dlovetohearthefeedbackonthisapproach! •CommentontheJIRAS,askquestions,contribute,disagree…;-) •LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689 •Atlaswiki •“Innovationhappensbestnotinisolationbutincollaboration”(keynote) •THANKS!
25.
Questions Afterthistalk jonesn@uk.ibm.com 17:50Room4–Security&GovernanceBOF z zzz z z z Questions?
26.
Backupcharts
27.
Atlas graphDB “gaiandb” IG C IGC REST API Oracle Data HDFS Data Netezza Data P-JDBCP-JDBCP-JDBC GAF OMAS Virtual Asset OMAS Search Search/ExploreUI Catalog OMAS OMR S OMR S GAF Pre GAF Post Connector Framework * Atlas boundaries Developed in POC May not be in POC iniNally *May be hardcoded at first Conne ctor Frame work ATLAS Virtualizer Architecture
28.
Metadataareasandtypes Policy Metadata (Principles, Regula6ons, Standards, Approaches, Rule Specifica6ons, Roles and Metrics) Governance Ac6ons and Processes Augmenta6on Mapping Implementa6on Connector Directories Access Access Informa6on Auditor Integra6on Developer Business Analyst Data Scien6st Informa6on Worker Informa6on Owner Informa6on Governor Informa6on Steward Data Quality Analyst Business Objects and Rela6onships, Taxonomies and Ontologies Business AMributes Organiza6on Informa6on Curator Teaming Metadata (people profiles, communi6es, projects, notebooks, …) Models and Schemas 3 2 4 5 Physical Asset Descrip6ons (Data stores, APIs, models and components) Asset Collec6ons (Sets, Typed Sets, Type Organized Sets) Informa6on Views Rights Management Reference Data Feedback Metadata (tags, comments, ra6ngs, …) Classifica6on Schemes C l a s s if i c a 6 o n StrategySubject Area Defini6on Campaigns and Projects Infrastructure and systems Rollout 1 Discovery Metadata (profile data, technical classifica6on, data classifica6on, data quality assessment, …) Augmenta6on Instrument Associa6on Informa6on Process Instrumenta6on (design lineage) 6 7
29.
User&Group/Rolesynchronization UserSync2 LDAPholdsrole-membership (LDAPgroups)–couldalsobe ActiveDirectory ATLASmanagesdefinitive listofroles<thatareusedfor atlasmanagedsources> •CorporateLDAPhasahugenumberofusers/groups •Rangercurrentlyneedstosyncall •Infutureperhapsweestablishgroup/rolemembership duringauthentication •Capabilityforalternativesourcecouldbemergedinto baseUserSync LDAPlookup-> group:member GovernanceActionOMAS -getRoles Apache Ranger LDAP ApacheAtlas
30.
AtlasGlossaryv2:TagSynctoRanger TagSync2 ATLASglossarymanagesa sophisticatedenterpriseglossary structure •AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach •NewAPIinAtlaswillflattenclassificationstructure •Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework GovernanceActionOMAS Confidential Salary emp_renum Business Term HiveColumn Business Term Confidential emp_renum HiveColumn Tag Apache Ranger ApacheAtlas
31.
Policy(Rule)synchronization RuleSync •GeneratepoliciesinRangerbasedoffentitiesinAtlas •Currentlydesigninghowthisworks •ScopedbypolicyservicesoexistingRangerUIapproachstillworks GovernanceActionOMAS -getRules Role Classifications Asset RangerRule Action Apache RangerApacheAtlas
32.
VirtualDataConnectorJIRAS20170402 •RANGER- 1488 •RANGER- 1487 •RANGER- 1486 •RANGER- 1485 •RANGER- 1464 •RANGER- 1454 •RANGER- 1234 •RANGER- •CreateRangerpluginforgaiandb •generaterulesfromGovernancedefinitionsinAtlas •NewusersyncalternativeforAtlas(vdc) •RangersupportforVirtualDataConnectorProject(ATLAS) •SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc) •SupportofAtlasv2glossaryAPIproposalfortagsource •Post-evaluationphaseuserextensions •RangerSource:eclipse •Adddatamaskingfortagbasedpolicies •GovernanceActionFrameworkOMAS •SampleassetstosupportVirtualConnectorProject •OMASInterfacesforAtlas •BuildATLASusingDocker
33.
References •ApacheAtlas-http://atlas.apache.org/ •ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689 •ApacheRanger-http://ranger.apache.org/ •GaianDB •https://github.com/gaiandb/gaiandb •https://developer.ibm.com/open/openprojects/gaian-database/ •Thecaseforopenmetadata–A.M.Chessell •http://www.ibmbigdatahub.com/blog/case-open-metadata
Download now