SlideShare a Scribd company logo
1 of 24
Download to read offline
Hadoop Security from the Trenches
Bolke de Bruin
Chief Wizard
This is not going to be a perfect talk
• It will be incomplete (squeezed for time)
• Probably without humor (I am just really bad in telling jokes)
• I have a disclaimer (work for a Bank)
• A lot of text in Orange (ING is Oranje)
Agenda
• Security today in Hadoop
• Kerberos (In depth)
• Policy based access
• Lineage (A bit)
• Encryption
Information security principals
Confidentiality
• Information is not
made available or
disclosed to
unauthorized
individuals and,
entities or
processes
Integrity
• Maintaining and
assuring the
accuracy and
completeness of
data over its entire
lifecycle
Availability
•Data must be
available when it
is needed
Unfortunately most of the
attention in Hadoop goes to
confidentiality
Security today in Hadoop
Authentication
Who am I?
Kerberos
Apache
Knox
Authorization
What can I do?
Apache
Ranger
Apache
Sentry
Audit
What did I do?
Apache
Ranger
Cloudera
Navigator
Data Protection
Can someone read
my data?
SSL
SASL
KMS
Data Governance
Where did my data
come from and
where is it going?
Apache
Atlas
Cloudera
Navigator
Identity Management
Taming Kerberos, the ferocious three-headed guard dog of Hadoop
Typical workflow
Kerberized workflow
Use ticket
Hive gets
service ticket HDFS gets
service ticket
Kerberos has great advantages…
• Requires that each client, each request prove it’s identity
• Does not require a user to enter password everytime a service registered
• Works across operating systems
• Kerberos assumes that network connections rather than servers and workstations
are the weak link in network security
• Did you know that Active Directory is just Kerberos+LDAP?
…but its perceived complexity has stopped implementation
• AS, KDC, TGS, SS, TGT, KINIT, KEYTAB, KADMIN So many abbrevations…
• But you just need to remember a few: kinit, keytab, kdc
• Synchronization of host clocks required
• What wait? You didn’to do that yet? Your local cloud provider already does this for you.
• Separate user databases if combined with LDAP or PAM
• Well there is Active Directory and there is FreeIPA
• Tool Xxx is not kerberized and I really need it
• Insecure don’t use it or add patches yourself. Yeah OpenSource!
Ehh FreeIPA?
Looks familiar doesn’t it? Oh yes this is Active Directory!
Integration in an Enterprise environment
• Fully integrated with Operating System
and Hadoop
• UserIDs are the same, shared and
immediate
• Can use PAM
• YARN, HDFS acls start working out of
the box as local users just exist That is the big stuff!
Installing is difficult right?
• Server
• # yum –y install ipa-server
• # ipa-server-install
• Client
• # yum –y install ipa-client
• # ipa-client-install
Support in Hadoop distributions is slightly lagging
Quite easy actually: gen_credentials.sh
just needs to be adjusted:
http://blog.godatadriven.com/samba-
configuration.html (for IPA it needs to be
adjusted)
https://github.com/HariSekhon/tools/blob/mast
er/ambari_freeipa_kerberos_setup.pl
Written by an ex cloudera guy ;-)
Caveats
• Trusted domains deliver users with “username@REALM”, Hadoop and Hive filter on ‘@’
• See: https://issues.apache.org/jira/browse/HADOOP-12751
• See: https://issues.apache.org/jira/browse/HIVE-12981
• Workaround: convert @ to _ by means of sssd
• full_name_format = %1$s_%2$s
• re_expression =
(((?P<Name>[^@]+)_(?P<Domain>.+$))|((?P<Domain>[^]+)(?P<Name>.+$))|((?P<Name>[^@]+)@(?P<D
omain>.+$))|(^(?P<Name>[^@]+)$))
• Or just wait for the patches to land
Data access policies and auditing with Ranger
How are policies applied?
Where is Spark?
Active policies
Caveats
• Ranger (but also Sentry) feels like slapped on security. Just usable, but barely
• User synchronization can be very slow with many users due to architecture issues
• Unix synchronization and authentication is using /etc/passwd /etc/group instead of NSS and PAM
• https://issues.apache.org/jira/browse/RANGER-842
• https://issues.apache.org/jira/browse/RANGER-827
• If these patches land syncing will be much faster for IPA/SSSD enabled systems
• No real Spark roadmap, just spark-sql. This also goes for Sentry
• Doesn’t manage HDFS ACLS and requires Hive user access… defeating end to end security
Data Governance
• Why?
• We need to be able to pinpoint what data resides where, why, what happened with it.
• Why?
• Cause you might want us to remove your data
• … and the regulator says so
Encryption
• Data at rest
• Used if you don’t trust your physical infrastructure. Cloud!
• Only our highest confidentiality levels require it, we are not at that level so we don’t use it
• Data in transit
• Data across untrusted networks. Cloud?
• Perimeter security solves a lot of these issues, you take a significant performance hit of around 20% if you
enable it within your cluster
• For ETL or data ingestion then it becomes more reasonable
• For us it is enabled for access TO the cluster NOT WITHIN
• Data democratization
• Use case: allow some data scientists to see the original data and some of the masked/anonimized data
• We are tinkering with this
An example architecture
We are hiring! Bolke.de.Bruin@ing.nl
24
Frank DerksJohn Muller Pooja Rao Hylke Hendriksen
Giovanni LanziniFabian Jansen Hanneke van Veldhuizen Johan Witman
Wendell KulingJonas Ahrendt Bolke de Bruin Ivo Everts
Doron Reuter
Zhe Sun

More Related Content

What's hot

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageTimothy Spann
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Cloudera, Inc.
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 

What's hot (20)

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 

Viewers also liked

Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right WayDataStax Academy
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
HAPPYWEEK 172 2016.05.30.
HAPPYWEEK 172 2016.05.30.HAPPYWEEK 172 2016.05.30.
HAPPYWEEK 172 2016.05.30.Jiří Černák
 
2013-09-12 - SUGDC - Office 365 and Hybrid Solutions
2013-09-12 - SUGDC - Office 365 and Hybrid Solutions2013-09-12 - SUGDC - Office 365 and Hybrid Solutions
2013-09-12 - SUGDC - Office 365 and Hybrid SolutionsDan Usher
 
Replacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phiReplacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phirobgirvan
 
Grafico diario del dax perfomance index para el 11 05-2012
Grafico diario del dax perfomance index para el 11 05-2012Grafico diario del dax perfomance index para el 11 05-2012
Grafico diario del dax perfomance index para el 11 05-2012Experiencia Trading
 
איתמר ורלי
איתמר ורליאיתמר ורלי
איתמר ורליmerkazy
 
RHEL 7. Контейнеры и Docker
RHEL 7. Контейнеры и DockerRHEL 7. Контейнеры и Docker
RHEL 7. Контейнеры и DockerAndrey Markelov
 
Turismo accesible.
Turismo accesible.Turismo accesible.
Turismo accesible.José María
 
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜ
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜ
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜIason Yannis Schizas
 
Oportunidad de negocio cardi ventas por catalogo para ganar dinero
Oportunidad de negocio cardi ventas por catalogo para ganar dineroOportunidad de negocio cardi ventas por catalogo para ganar dinero
Oportunidad de negocio cardi ventas por catalogo para ganar dineroVenta por Catalogo
 
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -Ryoya Kawai
 
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥ
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥ
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥkaratzid
 

Viewers also liked (19)

Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right Way
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
HAPPYWEEK 172 2016.05.30.
HAPPYWEEK 172 2016.05.30.HAPPYWEEK 172 2016.05.30.
HAPPYWEEK 172 2016.05.30.
 
2013-09-12 - SUGDC - Office 365 and Hybrid Solutions
2013-09-12 - SUGDC - Office 365 and Hybrid Solutions2013-09-12 - SUGDC - Office 365 and Hybrid Solutions
2013-09-12 - SUGDC - Office 365 and Hybrid Solutions
 
μεγαουρητήρας
μεγαουρητήραςμεγαουρητήρας
μεγαουρητήρας
 
Replacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phiReplacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phi
 
Grafico diario del dax perfomance index para el 11 05-2012
Grafico diario del dax perfomance index para el 11 05-2012Grafico diario del dax perfomance index para el 11 05-2012
Grafico diario del dax perfomance index para el 11 05-2012
 
איתמר ורלי
איתמר ורליאיתמר ורלי
איתמר ורלי
 
Aulas
AulasAulas
Aulas
 
RHEL 7. Контейнеры и Docker
RHEL 7. Контейнеры и DockerRHEL 7. Контейнеры и Docker
RHEL 7. Контейнеры и Docker
 
Turismo accesible.
Turismo accesible.Turismo accesible.
Turismo accesible.
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜ
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜ
ΥΠΑΤΙΑ Η ΑΛΕΞΑΝΔΡΙΝΗ ΠΡΟΣΤΑΤΙΣ ΤΩΝ ΕΛΛΗΝΙΚΩΝ ΓΡΑΜΜ
 
Privasi dan keselamatan data
Privasi dan keselamatan dataPrivasi dan keselamatan data
Privasi dan keselamatan data
 
Oportunidad de negocio cardi ventas por catalogo para ganar dinero
Oportunidad de negocio cardi ventas por catalogo para ganar dineroOportunidad de negocio cardi ventas por catalogo para ganar dinero
Oportunidad de negocio cardi ventas por catalogo para ganar dinero
 
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -
AudioとガジェットをWebで遊ぶ - Web Audio/MIDI Web Bluetooth -
 
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥ
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥ
ΠΑΡΟΥΣΙΑΣΗ ΑΠΟΤΕΛΕΣΜΑΤΩΝ ΕΡΕΥΝΑΣ ΔΗΜΟΤΙΚΟΥ ΣΧΟΛΕΙΟΥ ΓΑΛΛΙΚΟΥ
 

Similar to Nl HUG 2016 Feb Hadoop security from the trenches

CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 
Dev/Test in the Cloud - F
Dev/Test in the Cloud - FDev/Test in the Cloud - F
Dev/Test in the Cloud - FChris Riley ☁
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugaljavier ramirez
 
BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?Krzysztof Adamski
 
HyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLHyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLEvan Volgas
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Cloudera, Inc.
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 

Similar to Nl HUG 2016 Feb Hadoop security from the trenches (20)

CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
Dev/Test in the Cloud - F
Dev/Test in the Cloud - FDev/Test in the Cloud - F
Dev/Test in the Cloud - F
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
 
BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?
 
HyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQLHyperDB, MySQL Performance, & Flavors of MySQL
HyperDB, MySQL Performance, & Flavors of MySQL
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

Nl HUG 2016 Feb Hadoop security from the trenches

  • 1. Hadoop Security from the Trenches Bolke de Bruin Chief Wizard
  • 2. This is not going to be a perfect talk • It will be incomplete (squeezed for time) • Probably without humor (I am just really bad in telling jokes) • I have a disclaimer (work for a Bank) • A lot of text in Orange (ING is Oranje)
  • 3. Agenda • Security today in Hadoop • Kerberos (In depth) • Policy based access • Lineage (A bit) • Encryption
  • 4. Information security principals Confidentiality • Information is not made available or disclosed to unauthorized individuals and, entities or processes Integrity • Maintaining and assuring the accuracy and completeness of data over its entire lifecycle Availability •Data must be available when it is needed Unfortunately most of the attention in Hadoop goes to confidentiality
  • 5. Security today in Hadoop Authentication Who am I? Kerberos Apache Knox Authorization What can I do? Apache Ranger Apache Sentry Audit What did I do? Apache Ranger Cloudera Navigator Data Protection Can someone read my data? SSL SASL KMS Data Governance Where did my data come from and where is it going? Apache Atlas Cloudera Navigator Identity Management
  • 6. Taming Kerberos, the ferocious three-headed guard dog of Hadoop
  • 8. Kerberized workflow Use ticket Hive gets service ticket HDFS gets service ticket
  • 9. Kerberos has great advantages… • Requires that each client, each request prove it’s identity • Does not require a user to enter password everytime a service registered • Works across operating systems • Kerberos assumes that network connections rather than servers and workstations are the weak link in network security • Did you know that Active Directory is just Kerberos+LDAP?
  • 10. …but its perceived complexity has stopped implementation • AS, KDC, TGS, SS, TGT, KINIT, KEYTAB, KADMIN So many abbrevations… • But you just need to remember a few: kinit, keytab, kdc • Synchronization of host clocks required • What wait? You didn’to do that yet? Your local cloud provider already does this for you. • Separate user databases if combined with LDAP or PAM • Well there is Active Directory and there is FreeIPA • Tool Xxx is not kerberized and I really need it • Insecure don’t use it or add patches yourself. Yeah OpenSource!
  • 11. Ehh FreeIPA? Looks familiar doesn’t it? Oh yes this is Active Directory!
  • 12. Integration in an Enterprise environment • Fully integrated with Operating System and Hadoop • UserIDs are the same, shared and immediate • Can use PAM • YARN, HDFS acls start working out of the box as local users just exist That is the big stuff!
  • 13. Installing is difficult right? • Server • # yum –y install ipa-server • # ipa-server-install • Client • # yum –y install ipa-client • # ipa-client-install
  • 14. Support in Hadoop distributions is slightly lagging Quite easy actually: gen_credentials.sh just needs to be adjusted: http://blog.godatadriven.com/samba- configuration.html (for IPA it needs to be adjusted) https://github.com/HariSekhon/tools/blob/mast er/ambari_freeipa_kerberos_setup.pl Written by an ex cloudera guy ;-)
  • 15. Caveats • Trusted domains deliver users with “username@REALM”, Hadoop and Hive filter on ‘@’ • See: https://issues.apache.org/jira/browse/HADOOP-12751 • See: https://issues.apache.org/jira/browse/HIVE-12981 • Workaround: convert @ to _ by means of sssd • full_name_format = %1$s_%2$s • re_expression = (((?P<Name>[^@]+)_(?P<Domain>.+$))|((?P<Domain>[^]+)(?P<Name>.+$))|((?P<Name>[^@]+)@(?P<D omain>.+$))|(^(?P<Name>[^@]+)$)) • Or just wait for the patches to land
  • 16. Data access policies and auditing with Ranger
  • 17. How are policies applied? Where is Spark?
  • 19.
  • 20. Caveats • Ranger (but also Sentry) feels like slapped on security. Just usable, but barely • User synchronization can be very slow with many users due to architecture issues • Unix synchronization and authentication is using /etc/passwd /etc/group instead of NSS and PAM • https://issues.apache.org/jira/browse/RANGER-842 • https://issues.apache.org/jira/browse/RANGER-827 • If these patches land syncing will be much faster for IPA/SSSD enabled systems • No real Spark roadmap, just spark-sql. This also goes for Sentry • Doesn’t manage HDFS ACLS and requires Hive user access… defeating end to end security
  • 21. Data Governance • Why? • We need to be able to pinpoint what data resides where, why, what happened with it. • Why? • Cause you might want us to remove your data • … and the regulator says so
  • 22. Encryption • Data at rest • Used if you don’t trust your physical infrastructure. Cloud! • Only our highest confidentiality levels require it, we are not at that level so we don’t use it • Data in transit • Data across untrusted networks. Cloud? • Perimeter security solves a lot of these issues, you take a significant performance hit of around 20% if you enable it within your cluster • For ETL or data ingestion then it becomes more reasonable • For us it is enabled for access TO the cluster NOT WITHIN • Data democratization • Use case: allow some data scientists to see the original data and some of the masked/anonimized data • We are tinkering with this
  • 24. We are hiring! Bolke.de.Bruin@ing.nl 24 Frank DerksJohn Muller Pooja Rao Hylke Hendriksen Giovanni LanziniFabian Jansen Hanneke van Veldhuizen Johan Witman Wendell KulingJonas Ahrendt Bolke de Bruin Ivo Everts Doron Reuter Zhe Sun