Submit Search
Upload
Unleashing the Power of Apache Atlas with Apache Ranger
•
1 like
•
6,788 views
DataWorks Summit/Hadoop Summit
Follow
Unleashing the Power of Apache Atlas with Apache Ranger Slides
Read less
Read more
Technology
Report
Share
Report
Share
1 of 33
Download now
Download to read offline
Recommended
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
Gluster.community.day.2013
Gluster.community.day.2013
Udo Seidel
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Recommended
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
Gluster.community.day.2013
Gluster.community.day.2013
Udo Seidel
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
HDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Bart Vandewoestyne
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Hadoop
Hadoop
Nishant Gandhi
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
VMUG IT
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Exist
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Graph-TA
Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
Recommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
Yu-ching Lin
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
Scaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
More Related Content
Viewers also liked
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
HDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Bart Vandewoestyne
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Hadoop
Hadoop
Nishant Gandhi
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Viewers also liked
(12)
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
Hadoop introduction
Hadoop introduction
HDFS Design Principles
HDFS Design Principles
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
Hadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Hadoop
Hadoop
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Similar to Unleashing the Power of Apache Atlas with Apache Ranger
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
VMUG IT
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Exist
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Graph-TA
Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
Recommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
Yu-ching Lin
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
Scaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, Collaboration
Inside Analysis
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
Dan Delany
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Amazon Web Services
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
seoul_engineer
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
Amazon Web Services
Similar to Unleashing the Power of Apache Atlas with Apache Ranger
(16)
Impala use case @ Zoosk
Impala use case @ Zoosk
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Danilo Poccia & Massimo Re Rerre - vmugit usercon
Understanding the state of your web application using Apache Kafka, Spark
Understanding the state of your web application using Apache Kafka, Spark
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
Just the basics_strata_2013
Just the basics_strata_2013
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Recommender Systems at Scale
Recommender Systems at Scale
How KKBOX use mrjob to link python, hadoop, aws
How KKBOX use mrjob to link python, hadoop, aws
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Scaling PyData Up and Out
Scaling PyData Up and Out
Two Keys to Analytic Success: Cooperation, Collaboration
Two Keys to Analytic Success: Cooperation, Collaboration
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
Recently uploaded
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
Hyundai Motor Group
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Enjoy Anytime
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
null - The Open Security Community
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
null - The Open Security Community
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
Recently uploaded
(20)
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Unleashing the Power of Apache Atlas with Apache Ranger
1.
UnleashingthepowerofApacheAtlaswith ApacheRanger VirtualDataConnectorProject NIGELJONES JONESN@UK.IBM.COM DATAWORKS,MUNICH,APRIL2017 Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof thesemarks.
2.
AboutMe–NigelJones •https://www.linkedin.com/in/nigelljones/ •jonesn@uk.ibm.com(Anyonestilluseemail?) •@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life accountsdidn’tworkforme! •AndofcoursetheApacheAtlas&Rangermailinglists&JIRA! •Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud J •IBMHursley,UKsince1990 •Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata
3.
TheProblem….. WHYAREWEHERE…..
4.
Data? •WhatdatadoIhave? •Whatdoesitmean? •Whereisit? •Whohasaccesstoit? •Whoownsit? •Whatqualityisit? •Howdoesitrelatetootherdata? •HowtoIcontrol,audit&understandaccess?
5.
Regulatoryneeds •AdheretoregulationslikeBCBS-239andGDPR •Needtoknowmeaning,valueofthedata •Demonstrateprocessesinplacetogovernaccess •Audit •Significantfinesifrulesbreached •Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport anagilebusiness
6.
Sowhatdoweneedtoaddressthis?
7.
Metadata.. •Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit. •Analyticsanddecisionmaking •Newbusinessapplications •Reportingandcompliance •Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich datasettouseforanewproject •Structure •Meaning •Origin •Validvaluesandquality •Usageandownership •Regulationsandclassificationsthatapply
8.
Whichcansupport… •Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s businesscontext&howthedatashouldbegoverned…. •SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata needsanduse •Automatedgovernanceactionstoprotectandmanageincludingauditing, monitoring,qualitycontrol,rightsmanagement
9.
Buteasily… •Openframeworks&APIs •Automaticcollection&discoveryofmetadatainadynamicheterogeneous environment •Usingpredefinedstandardsforglossaries,schemas,rules,regulationstoreduce cost •Cheaptointegratenewtools •Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor •Avoidingsilos •DistributedandOpen
10.
Thevision Open and Unified Metadata
11.
VirtualizationDataConnectorproject
12.
Datavirtualizationproject •Collaboration–IBM,severalbanks&opencommunity •ADataLakeenvironment •NotjustHadoop,butothersourcestoo •BusinessTerms,Classifications,Metadatarich •Offervirtualizedviews.Exposerelationaldatawithbusinessterms •ManageAccesstoresources–permit,deny,log,filter/mask….THROUGH METADATA •Open,pluggable •Workingthroughusecases,design,initialMVP(thisyear) •Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe Atlas&Rangercommunitiesaswellascontributeourideas •ProposedchangesallgothroughmailinglistandJIRAforfeedback
13.
ApacheAtlas •“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices– enablingenterprisestoeffectivelyandefficientlymeettheircompliance requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata ecosystem.”….http://www.apache.org •OpenCommunity--ApacheIncubatorsinceMay2015 •Typeagnosticmetadatastore •RESTAPI&UI •SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm& others
14.
ApacheRanger •Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral UIorusingRESTAPIs. •Finegrainedauthorizationtodoaspecificactionand/oroperationwithHadoop component/toolandmanagedthroughacentraladministrationtool •StandardizeauthorizationmethodacrossallHadoopcomponents. •Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol, attributebasedaccesscontroletc. •Centralizeauditingofuseraccessandadministrativeactions(securityrelated) withinallthecomponentsofHadoop. •…fromhttp://ranger.apache.org
15.
ProjectInteractions Search/Report GaianDB •Searchforlistofassetsbymetadata •Searchfordata •Reportingtoolobtainsdatatodrawreport Underlyingdata,sql,hive, HDFS,Oracle,Netezzaetc Manageslogicalviews Deploysrules,pushes classifications,sourcefor userroles(notusers) +rangerplugintopermit/deny,masketc Pullsrules.classifications RDBMSHadoop ApacheAtlas Apache Ranger ApacheSolr
16.
WhyAtlasandRanger? •OpenSourceessentialtoforminganactiveecosystem •Vision,activecommunity&evolving–abilitytocontribute&workwithothersto providethebestsolution •Alreadyhavegoodcorecapabilities •Atlastypesystemisveryflexible •Rangeroffersarangeofpolicytypesandprovidesapluggableframework •Alreadycrossprojectintegration •UseoftagbasedpolicieinRangersourcedfromAtlas •CanbeusedindependentlyoffullHadoopstack
17.
Refinedvirtualconnectorscopescope GaianDB Ranger Plugin Titan (GraphDB, Metadata Repository) Ranger Config RangerServer Atlas PollPolicies OMAS OMRS IGC PrePostCreate View Metadata Extract physical metadata Manage Logical Tables Virtualizer Retrievemetadata Retrievemetadata Retrievemetadata Pushmetadata OracleNetezza Hive Tables Pushandquerymetadata DataLakeRepositories Meta Data DataLakeVirtualization tag-sync rule-sync Config (eg Policies, Audit log locaMon) LDAP Audit Log Mapper Searchfordata/reporting Pushandquery metadata Meta Data Navigator Meta Data Datameer
18.
GaianDB&Virtualizer •GaianDB •OpenSource •Federated,selflearning,dynamicconfiguration •BasedonApacheDerby •Alreadyhad“policy”support–we’replugginginRangerfor thisproject •Virtualizer •Listenstoeventnotificationsonassetsetc •CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto storemetadata.Couldusedifferentvirtualengine.. •Designedtobeopentoothervirtualizationtechnologies. LT1LT2 DS2DS1DS3 Policy Plugin (ranger) VirtualizerAtlas GaianDBsupportsfederation –notusedforMVP
19.
Atlas–glossaryenhancements •GetAtlasclosertoparitywithcommercialofferings •BusinessTerms–categories,categoryhierarchies •Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships •AssetsmappedtoBusinessTerms •Classifications •Hierarchy •Navigablemappingstoretainabilitytoflattentagstoranger •InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY-> SPI… •Usedtodrivegovernance •ATLAS-1410
20.
Atlas–otherenhancements •ConsumerCentricAPIs •OpenMetadataAccessServices(OMAS) •REST&moreKafkanotifications •Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions, InformationView,RolesandAccess •RepositorylevelAPIs •OpenMetadataRepositoryServices(OMRS) •REST&moreKafkanotifications •PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories– distributedandOpen •Standarddatamodel/core •Enhancementtocoremodel–versioning,externallinkageetc •Morestandardtypesieforallrelationaldatabasestoeasesharing
21.
Rangerareasbeinglookedat •BuildingapluginforGaianDB •Accesscontrol,simplemasking.Morelater •Usersynchronization(large#users,roleofAtlas) •ChangestotagsyncprocessforNewglossaryproposal •AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds ofpolicies.Whereisthemaster? •Generatingrangerrulesfromgovernancedefinitions •HowaboutcontrolofaccesstoAtlasitself? •Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata) needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas Atlas
22.
BeyondtheMVP •OpenDiscoveryFramework •Considerothersecurityenforcementengines–suchasApacheSentry&driving morecapabilityaroundrules&governanceactionsfromAtlasmetadata •Workonstandardmodelstosupportdifferentdomains •Lineage •Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph…. •APImetadata •Infrastructure–JanusGraph… •AbstractionaddedbyIBMinlastfewmonthsfortitan1
23.
Thevision •Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage), owner,structure,meaning,classificationandquality •Spanningsystemsbothonpremiseandcloudproviders •Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview •Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox •Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository •Metadataisaddedautomaticallytothecatalogasnewdataiscreated •Extensiblediscoveryprocessescharacteriseandclassifythedata •Interestedpartiesandprocessesarenotified •Subjectmatterexpertscollaboratingaroundthedata •Locatethedatatheyneed,quicklyandefficiently •Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand supporteconomicevaluationofdata •Automatedgovernanceprocessesprotectandmanageyourdata •Metadata-drivenaccesscontrol
24.
Summary •Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha vibrantecosystemcanevolve •NotonlyinHadoopbutmorebroadly •Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter, andbecompliantwithregulations •Theideaspresentedhereresonatewithmanypeoplewe’vespokento •Getinvolved!I’dlovetohearthefeedbackonthisapproach! •CommentontheJIRAS,askquestions,contribute,disagree…;-) •LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689 •Atlaswiki •“Innovationhappensbestnotinisolationbutincollaboration”(keynote) •THANKS!
25.
Questions Afterthistalk jonesn@uk.ibm.com 17:50Room4–Security&GovernanceBOF z zzz z z z Questions?
26.
Backupcharts
27.
Atlas graphDB “gaiandb” IG C IGC REST API Oracle Data HDFS Data Netezza Data P-JDBCP-JDBCP-JDBC GAF OMAS Virtual Asset OMAS Search Search/ExploreUI Catalog OMAS OMR S OMR S GAF Pre GAF Post Connector Framework * Atlas boundaries Developed in POC May not be in POC iniNally *May be hardcoded at first Conne ctor Frame work ATLAS Virtualizer Architecture
28.
Metadataareasandtypes Policy Metadata (Principles, Regula6ons, Standards, Approaches, Rule Specifica6ons, Roles and Metrics) Governance Ac6ons and Processes Augmenta6on Mapping Implementa6on Connector Directories Access Access Informa6on Auditor Integra6on Developer Business Analyst Data Scien6st Informa6on Worker Informa6on Owner Informa6on Governor Informa6on Steward Data Quality Analyst Business Objects and Rela6onships, Taxonomies and Ontologies Business AMributes Organiza6on Informa6on Curator Teaming Metadata (people profiles, communi6es, projects, notebooks, …) Models and Schemas 3 2 4 5 Physical Asset Descrip6ons (Data stores, APIs, models and components) Asset Collec6ons (Sets, Typed Sets, Type Organized Sets) Informa6on Views Rights Management Reference Data Feedback Metadata (tags, comments, ra6ngs, …) Classifica6on Schemes C l a s s if i c a 6 o n StrategySubject Area Defini6on Campaigns and Projects Infrastructure and systems Rollout 1 Discovery Metadata (profile data, technical classifica6on, data classifica6on, data quality assessment, …) Augmenta6on Instrument Associa6on Informa6on Process Instrumenta6on (design lineage) 6 7
29.
User&Group/Rolesynchronization UserSync2 LDAPholdsrole-membership (LDAPgroups)–couldalsobe ActiveDirectory ATLASmanagesdefinitive listofroles<thatareusedfor atlasmanagedsources> •CorporateLDAPhasahugenumberofusers/groups •Rangercurrentlyneedstosyncall •Infutureperhapsweestablishgroup/rolemembership duringauthentication •Capabilityforalternativesourcecouldbemergedinto baseUserSync LDAPlookup-> group:member GovernanceActionOMAS -getRoles Apache Ranger LDAP ApacheAtlas
30.
AtlasGlossaryv2:TagSynctoRanger TagSync2 ATLASglossarymanagesa sophisticatedenterpriseglossary structure •AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach •NewAPIinAtlaswillflattenclassificationstructure •Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework GovernanceActionOMAS Confidential Salary emp_renum Business Term HiveColumn Business Term Confidential emp_renum HiveColumn Tag Apache Ranger ApacheAtlas
31.
Policy(Rule)synchronization RuleSync •GeneratepoliciesinRangerbasedoffentitiesinAtlas •Currentlydesigninghowthisworks •ScopedbypolicyservicesoexistingRangerUIapproachstillworks GovernanceActionOMAS -getRules Role Classifications Asset RangerRule Action Apache RangerApacheAtlas
32.
VirtualDataConnectorJIRAS20170402 •RANGER- 1488 •RANGER- 1487 •RANGER- 1486 •RANGER- 1485 •RANGER- 1464 •RANGER- 1454 •RANGER- 1234 •RANGER- •CreateRangerpluginforgaiandb •generaterulesfromGovernancedefinitionsinAtlas •NewusersyncalternativeforAtlas(vdc) •RangersupportforVirtualDataConnectorProject(ATLAS) •SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc) •SupportofAtlasv2glossaryAPIproposalfortagsource •Post-evaluationphaseuserextensions •RangerSource:eclipse •Adddatamaskingfortagbasedpolicies •GovernanceActionFrameworkOMAS •SampleassetstosupportVirtualConnectorProject •OMASInterfacesforAtlas •BuildATLASusingDocker
33.
References •ApacheAtlas-http://atlas.apache.org/ •ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689 •ApacheRanger-http://ranger.apache.org/ •GaianDB •https://github.com/gaiandb/gaiandb •https://developer.ibm.com/open/openprojects/gaian-database/ •Thecaseforopenmetadata–A.M.Chessell •http://www.ibmbigdatahub.com/blog/case-open-metadata
Download now