SlideShare a Scribd company logo
Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera	

Wednesday, 7th November 2013
Defining  Security  Func/ons

Perimeter	
  
!
!
!

!2

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage
Enabling  Enterprise  Security

Perimeter	
  
!
!
!

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

	
  Kerberos	
  |	
  Oozie	
  |	
  Knox

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

Cer3fied	
  Partners

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

Sentry

Available	
  7/23

!3

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage

Cloudera	
  Navigator
Hive  Overview
SQL	
  Access	
  to	
  Hadoop	
  
§
§

MapReduce:	
  great	
  massively	
  scalable	
  batch	
  processing	
  framework;	
  
required	
  development	
  for	
  each	
  new	
  job	
  
Hive	
  opened	
  up	
  Hadoop	
  for	
  more	
  users	
  with	
  standard	
  SQL	
  
!

Key	
  Challenges	
  
§
§

Batch	
  MapReduce	
  too	
  slow	
  for	
  interac3ve	
  BI/analy3cs	
  
No	
  concurrency,	
  no	
  security	
  
!

OpEons	
  Today	
  
§
§

!4

Impala	
  designed	
  for	
  low-­‐latency	
  queries	
  
HiveServer2	
  delivers	
  concurrency,	
  authen3ca3on	
  
Our  OpenSource  ac/vity
CDH	
  4.1	
  (HiveServer2)	
  
§
§

Concurrency	
  and	
  Kerberos	
  authen3ca3on	
  for	
  Hive	
  
JDBC	
  and	
  Beeline	
  clients	
  

CDH	
  4.2	
  
§
§
§

HDFS	
  impersona3on	
  authoriza3on	
  as	
  stop-­‐gap	
  
Pluggable	
  authen3ca3on	
  API	
  
JDBC	
  LDAP	
  username/password	
  

ODBC	
  
§
§

!5

Supports	
  Kerberos	
  authen3ca3on	
  and	
  LDAP	
  
Extended	
  partner	
  cer3fica3on
Current  State  of  Authoriza/on
Two	
  Sub-­‐OpEmal	
  Choices	
  for	
  SQL	
  on	
  Hadoop
Insecure	
  Advisory	
  Authoriza3on	
  
Users	
  can	
  grant	
  themselves	
  permissions	
  
Intended	
  to	
  prevent	
  accidental	
  dele3on	
  of	
  data	
  
Problem:	
  Doesn’t	
  guard	
  against	
  malicious	
  users	
  

HDFS	
  Impersona3on	
  
Data	
  is	
  protected	
  at	
  the	
  file	
  level	
  by	
  HDFS	
  permissions	
  
Problem:	
  File-­‐level	
  not	
  granular	
  enough	
  
Problem:	
  Not	
  role-­‐based

!6
Authoriza/on  Requirements
Secure	
  Authoriza3on	
  
Ability	
  to	
  control	
  access	
  to	
  data	
  and/or	
  privileges	
  on	
  data	
  for	
  
authen3cated	
  users	
  

Fine-­‐Grained	
  Authoriza3on	
  
Ability	
  to	
  give	
  users	
  access	
  to	
  a	
  subset	
  of	
  data	
  (e.g.	
  column)	
  in	
  a	
  
database	
  

Role-­‐Based	
  Authoriza3on	
  
Ability	
  to	
  create/apply	
  templa3zed	
  privileges	
  based	
  on	
  
func3onal	
  roles	
  

Mul3-­‐Tenant	
  Administra3on	
  
Ability	
  for	
  central	
  admin	
  group	
  to	
  empower	
  lower-­‐level	
  admins	
  
to	
  manage	
  security	
  for	
  each	
  database/schema

!7
The  Next  Step:  Introducing  Sentry
AuthorizaEon	
  module	
  for	
  Hive	
  &	
  Impala
Unlocks	
  Key	
  RBAC	
  Requirements	
  
Secure,	
  fine-­‐grained,	
  role-­‐based	
  authoriza3on	
  
Mul3-­‐tenant	
  administra3on	
  

Open	
  Source	
  
Intent	
  to	
  donate	
  to	
  ASF	
  

Available	
  and	
  Fully	
  Supported	
  
Hiveserver2	
  &	
  Impala	
  1.1	
  ini3ally

!8
Key  Benefits  of  Sentry
Store	
  Sensi3ve	
  Data	
  in	
  Hadoop	
  
Extend	
  Hadoop	
  to	
  More	
  Users	
  
Enable	
  New	
  Use	
  Cases	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  
Comply	
  with	
  Regula3ons

!9
Key  Capabili/es  of  Sentry
Fine-­‐Grained	
  Authoriza3on	
  
Specify	
  security	
  for	
  SERVERS,	
  DATABASES,	
  TABLES	
  &	
  VIEWS	
  

Role-­‐Based	
  Authoriza3on	
  
SELECT	
  privilege	
  on	
  views	
  &	
  tables	
  	
  
INSERT	
  privilege	
  on	
  tables	
  
TRANSFORM	
  privilege	
  on	
  servers	
  
ALL	
  privilege	
  on	
  the	
  server,	
  databases,	
  tables	
  &	
  views	
  
ALL	
  privilege	
  is	
  needed	
  to	
  create/modify	
  schema	
  

Mul3-­‐Tenant	
  Administra3on	
  
Separate	
  policies	
  for	
  each	
  database/schema	
  
Can	
  be	
  maintained	
  by	
  separate	
  admins

!10
Apache  Ecosystem  and  Sentry
Shared	
  Hive	
  Metastore	
  (with	
  
HCatalog)	
  
Extensibility	
  plug-­‐in	
  for	
  
HiveServer2	
  
Inline	
  support	
  in	
  Impala	
  1.1	
  
Poten3al	
  extension	
  to	
  Pig,	
  
MapReduce,	
  REST

Hive  Metastore

HCatalog  

M
!11

Sentry
Possible	
  future	
  
development

RE
Sentry  Architecture
Impala

Binding	
  
Layer

HiveServer2

Impala

Hive

Authoriza<on	
  
Provider

Future

Policy	
  Engine
Policy	
  Provider
File

Local	
  FS/HDFS

!12

Database

Interface
Evalua3on,	
  Valida3on
Parsing
Interface
Query  Execu/on  Flow
SQL

Parse

Validate	
  SQL	
  grammar

Build

Construct	
  statement	
  tree

Check

Validate	
  statement	
  objects	
  
• First	
  check:	
  Authoriza3on
Forward	
  to	
  execu3on	
  planner

Plan
MR
!13

Sentry

Query
Example  Security  Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1, 
# Defines the location of the per DB policy file for
server=server1->uri=hdfs://ha-nn-uri/
the
landing/jranalyst1
# ‘customers’ DB (schema)
customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the
global policy
# file even though ‘customers’ has its only policy
[groups]
file.
# Assigns Hadoop groups to their respective set of
# Note that the privileges from both the global
roles
policy file and
manager = analyst_role, junior_analyst_role
# the per-db policy file are merged. There is no
analyst = analyst_role
overriding.
jranalyst = junior_analyst_role
customers_admin_role = server=server1->db=customers
customers_admin = customers_admin_role
admin = admin_role
# Role controls everything on server1.
admin_role = server=server1
[roles]
# Roles that can import or export data to the the URIs
defined,
# i.e. a landing zone. Since the server runs as the
user "hive,"
# files in this directory must either have the “hive”
group set
# with read/write or be set world read/write.
analyst_role = server=server1->db=analyst1, 
server=server1->db=jranalyst1->table=*>action=select 
server=server1->uri=hdfs://ha-nn-uri/landing/
analyst1
(Continued on next column)

!

!

!

!

!

# Role controls everything for the ‘customers’ DB on
server1.

!14

!
Live  Demo  &  Give  Aways
Closes	
  gap	
  between	
  HDFS	
  and	
  Metastore	
  
Easy	
  to	
  implement	
  
RFC	
  2307	
  compilant	
  (Kerberos)	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  in	
  one	
  Hive	
  WH	
  
Enables	
  Mul3	
  Tendency	
  per	
  Row	
  and	
  Column	
  

!15
About
dev@sentry.incubator.apache.org	

alexander@cloudera.com	

@mapredit	

mapredit.blogspot.com	

!

Web: http://wiki.apache.org/incubator/SentryProposal

16
Sentry - An Introduction

More Related Content

What's hot

Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
Paul Podolny
 
Docker: From Zero to Hero
Docker: From Zero to HeroDocker: From Zero to Hero
Docker: From Zero to Hero
fazalraja
 
Performance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern PerformancePerformance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern Performance
ScyllaDB
 
An Introduction to Kubernetes
An Introduction to KubernetesAn Introduction to Kubernetes
An Introduction to Kubernetes
Imesh Gunaratne
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
Frederik Mogensen
 
Devops as a service
Devops as a serviceDevops as a service
Devops as a service
Saravanan Subburayal
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
Minio
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
Kasper Nissen
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
CJ Cullen
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
Araf Karsh Hamid
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
Eueung Mulyana
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
Knoldus Inc.
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
Peng Xiao
 

What's hot (20)

Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Docker: From Zero to Hero
Docker: From Zero to HeroDocker: From Zero to Hero
Docker: From Zero to Hero
 
Performance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern PerformancePerformance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern Performance
 
An Introduction to Kubernetes
An Introduction to KubernetesAn Introduction to Kubernetes
An Introduction to Kubernetes
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Devops as a service
Devops as a serviceDevops as a service
Devops as a service
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 

Similar to Sentry - An Introduction

Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Cloudera, Inc.
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentryBrock Noland
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!
 
OWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASROWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASR
Laravel Poland MeetUp
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
Bill Liu
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
Karan Alang
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
Sandeep Patil
 
Securing Open Source Databases
Securing Open Source DatabasesSecuring Open Source Databases
Securing Open Source Databases
Gazzang
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
Cloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
John Dougherty
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
shrey mehrotra
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
IGEEKS TECHNOLOGIES
 
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesEncryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Trend Micro
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
VMahesh5
 
2016 share the three headed beast v4
2016 share the three headed beast v42016 share the three headed beast v4
2016 share the three headed beast v4
bigendiansmalls
 
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureLow Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
MongoDB
 

Similar to Sentry - An Introduction (20)

Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
OWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASROWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASR
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Ppt linux
Ppt linuxPpt linux
Ppt linux
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
Securing Open Source Databases
Securing Open Source DatabasesSecuring Open Source Databases
Securing Open Source Databases
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Gradution Project
Gradution ProjectGradution Project
Gradution Project
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
 
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesEncryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
2016 share the three headed beast v4
2016 share the three headed beast v42016 share the three headed beast v4
2016 share the three headed beast v4
 
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureLow Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
 

More from Alexander Alten

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
Alexander Alten
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
Alexander Alten
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
Alexander Alten
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
Alexander Alten
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
Alexander Alten
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Alexander Alten
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)Alexander Alten
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
Alexander Alten
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
Alexander Alten
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
Alexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache HadoopAlexander Alten
 

More from Alexander Alten (13)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Sentry - An Introduction

  • 1. Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013
  • 2. Defining  Security  Func/ons Perimeter   ! ! ! !2 Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on ! ! Technical  Concepts:   Encryp3on   Data  masking ! ! Technical  Concepts:   Permissions   Authoriza3on ! ! Technical  Concepts:   Audi3ng   Lineage
  • 3. Enabling  Enterprise  Security Perimeter   ! ! ! Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on  Kerberos  |  Oozie  |  Knox ! ! Technical  Concepts:   Encryp3on   Data  masking Cer3fied  Partners ! ! Technical  Concepts:   Permissions   Authoriza3on Sentry Available  7/23 !3 ! ! Technical  Concepts:   Audi3ng   Lineage Cloudera  Navigator
  • 4. Hive  Overview SQL  Access  to  Hadoop   § § MapReduce:  great  massively  scalable  batch  processing  framework;   required  development  for  each  new  job   Hive  opened  up  Hadoop  for  more  users  with  standard  SQL   ! Key  Challenges   § § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs   No  concurrency,  no  security   ! OpEons  Today   § § !4 Impala  designed  for  low-­‐latency  queries   HiveServer2  delivers  concurrency,  authen3ca3on  
  • 5. Our  OpenSource  ac/vity CDH  4.1  (HiveServer2)   § § Concurrency  and  Kerberos  authen3ca3on  for  Hive   JDBC  and  Beeline  clients   CDH  4.2   § § § HDFS  impersona3on  authoriza3on  as  stop-­‐gap   Pluggable  authen3ca3on  API   JDBC  LDAP  username/password   ODBC   § § !5 Supports  Kerberos  authen3ca3on  and  LDAP   Extended  partner  cer3fica3on
  • 6. Current  State  of  Authoriza/on Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop Insecure  Advisory  Authoriza3on   Users  can  grant  themselves  permissions   Intended  to  prevent  accidental  dele3on  of  data   Problem:  Doesn’t  guard  against  malicious  users   HDFS  Impersona3on   Data  is  protected  at  the  file  level  by  HDFS  permissions   Problem:  File-­‐level  not  granular  enough   Problem:  Not  role-­‐based !6
  • 7. Authoriza/on  Requirements Secure  Authoriza3on   Ability  to  control  access  to  data  and/or  privileges  on  data  for   authen3cated  users   Fine-­‐Grained  Authoriza3on   Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a   database   Role-­‐Based  Authoriza3on   Ability  to  create/apply  templa3zed  privileges  based  on   func3onal  roles   Mul3-­‐Tenant  Administra3on   Ability  for  central  admin  group  to  empower  lower-­‐level  admins   to  manage  security  for  each  database/schema !7
  • 8. The  Next  Step:  Introducing  Sentry AuthorizaEon  module  for  Hive  &  Impala Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authoriza3on   Mul3-­‐tenant  administra3on   Open  Source   Intent  to  donate  to  ASF   Available  and  Fully  Supported   Hiveserver2  &  Impala  1.1  ini3ally !8
  • 9. Key  Benefits  of  Sentry Store  Sensi3ve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Enable  New  Use  Cases   Enable  Mul3-­‐User  Applica3ons   Comply  with  Regula3ons !9
  • 10. Key  Capabili/es  of  Sentry Fine-­‐Grained  Authoriza3on   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  Authoriza3on   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   TRANSFORM  privilege  on  servers   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   Mul3-­‐Tenant  Administra3on   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins !10
  • 11. Apache  Ecosystem  and  Sentry Shared  Hive  Metastore  (with   HCatalog)   Extensibility  plug-­‐in  for   HiveServer2   Inline  support  in  Impala  1.1   Poten3al  extension  to  Pig,   MapReduce,  REST Hive  Metastore HCatalog   M !11 Sentry Possible  future   development RE
  • 12. Sentry  Architecture Impala Binding   Layer HiveServer2 Impala Hive Authoriza<on   Provider Future Policy  Engine Policy  Provider File Local  FS/HDFS !12 Database Interface Evalua3on,  Valida3on Parsing Interface
  • 13. Query  Execu/on  Flow SQL Parse Validate  SQL  grammar Build Construct  statement  tree Check Validate  statement  objects   • First  check:  Authoriza3on Forward  to  execu3on  planner Plan MR !13 Sentry Query
  • 14. Example  Security  Policy [databases] junior_analyst_role = server=server1->db=jranalyst1, # Defines the location of the per DB policy file for server=server1->uri=hdfs://ha-nn-uri/ the landing/jranalyst1 # ‘customers’ DB (schema) customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the global policy # file even though ‘customers’ has its only policy [groups] file. # Assigns Hadoop groups to their respective set of # Note that the privileges from both the global roles policy file and manager = analyst_role, junior_analyst_role # the per-db policy file are merged. There is no analyst = analyst_role overriding. jranalyst = junior_analyst_role customers_admin_role = server=server1->db=customers customers_admin = customers_admin_role admin = admin_role # Role controls everything on server1. admin_role = server=server1 [roles] # Roles that can import or export data to the the URIs defined, # i.e. a landing zone. Since the server runs as the user "hive," # files in this directory must either have the “hive” group set # with read/write or be set world read/write. analyst_role = server=server1->db=analyst1, server=server1->db=jranalyst1->table=*>action=select server=server1->uri=hdfs://ha-nn-uri/landing/ analyst1 (Continued on next column) ! ! ! ! ! # Role controls everything for the ‘customers’ DB on server1. !14 !
  • 15. Live  Demo  &  Give  Aways Closes  gap  between  HDFS  and  Metastore   Easy  to  implement   RFC  2307  compilant  (Kerberos)   Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH   Enables  Mul3  Tendency  per  Row  and  Column   !15