This document discusses Sentry, an open source authorization module for Hive and Impala. It provides fine-grained, role-based authorization to define user and application access to data. Sentry aims to enable secure authorization that existing options like advisory permissions and HDFS impersonation lack. The document outlines Sentry's architecture and capabilities, and how it integrates with the Hadoop ecosystem through the Hive metastore. A demo is offered to showcase Sentry's authorization policies and features.
Performance Engineering Masterclass: Introduction to Modern PerformanceScyllaDB
Leandro Melendez from Grafana k6 starts off by providing a grounding in current expectations of what performance engineering and load testing entail. This session defines the modern challenges developers face, including continuous performance principles, Service Level Objectives (SLOs), and Service Level Indicators (SLIs). It delineates best practices and provides hands-on examples using Grafana k6, an open source modern load testing tool.
Traditional virtualization technologies have been used by cloud infrastructure providers for many years in providing isolated environments for hosting applications. These technologies make use of full-blown operating system images for creating virtual machines (VMs). According to this architecture, each VM needs its own guest operating system to run application processes. More recently, with the introduction of the Docker project, the Linux Container (LXC) virtualization technology became popular and attracted the attention. Unlike VMs, containers do not need a dedicated guest operating system for providing OS-level isolation, rather they can provide the same level of isolation on top of a single operating system instance.
An enterprise application may need to run a server cluster to handle high request volumes. Running an entire server cluster on Docker containers, on a single Docker host could introduce the risk of single point of failure. Google started a project called Kubernetes to solve this problem. Kubernetes provides a cluster of Docker hosts for managing Docker containers in a clustered environment. It provides an API on top of Docker API for managing docker containers on multiple Docker hosts with many more features.
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...Lucas Jellema
Automation of software delivery has several advantages. Prevention of human error is certainly one. Consistent and complete execution of tried and tested build and deployment tasks as the only way to apply changes in the live environment. Once the pipelines have been set up, the engineers can focus on the software and applying the required changes to it. To bring that software all the way to production is a breeze. Oracle Cloud Infrastructure offers the DevOps service, introduced in the Summer of 2021. This service comes with git style code repositories, build servers and build pipelines, artifact repositories as well as deployment pipelines. This session introduces OCI DevOps and demonstrates how software can be built and deployed on OKE Kubernetes, Compute Instance VMs and Oracle Functions. From simple source code an application is put in production without manual intervention in the build and deployment process.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
An introduction to docker; the concepts; how to use it and why. The presentation is mainly based on the following presentation by docker, but with added info about Docker Compose and Docker Swarm.
https://www.slideshare.net/Docker/docker-101-nov-2016
#container #docker #Trifork #TriforkSelected #GotoConf
Modern data lakes are now built on cloud storage, helping organizations leverage the scale and economics of object storage while simplifying overall data storage and analysis flow
OpenShift 4, the smarter Kubernetes platformKangaroot
OpenShift 4 introduces automated installation, patching, and upgrades for every layer of the container stack from the operating system through application services.
In this session, we will learn about Teamcity CI Server. We will look at the different options available and how we can set a CI pipeline using Teamcity.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
Learn how you can use Cloudera Impala to:
- Operate with all data in your domain
- Address cyber security analysis and forensics needs
- Combat fraud, waste, and abuse
Performance Engineering Masterclass: Introduction to Modern PerformanceScyllaDB
Leandro Melendez from Grafana k6 starts off by providing a grounding in current expectations of what performance engineering and load testing entail. This session defines the modern challenges developers face, including continuous performance principles, Service Level Objectives (SLOs), and Service Level Indicators (SLIs). It delineates best practices and provides hands-on examples using Grafana k6, an open source modern load testing tool.
Traditional virtualization technologies have been used by cloud infrastructure providers for many years in providing isolated environments for hosting applications. These technologies make use of full-blown operating system images for creating virtual machines (VMs). According to this architecture, each VM needs its own guest operating system to run application processes. More recently, with the introduction of the Docker project, the Linux Container (LXC) virtualization technology became popular and attracted the attention. Unlike VMs, containers do not need a dedicated guest operating system for providing OS-level isolation, rather they can provide the same level of isolation on top of a single operating system instance.
An enterprise application may need to run a server cluster to handle high request volumes. Running an entire server cluster on Docker containers, on a single Docker host could introduce the risk of single point of failure. Google started a project called Kubernetes to solve this problem. Kubernetes provides a cluster of Docker hosts for managing Docker containers in a clustered environment. It provides an API on top of Docker API for managing docker containers on multiple Docker hosts with many more features.
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...Lucas Jellema
Automation of software delivery has several advantages. Prevention of human error is certainly one. Consistent and complete execution of tried and tested build and deployment tasks as the only way to apply changes in the live environment. Once the pipelines have been set up, the engineers can focus on the software and applying the required changes to it. To bring that software all the way to production is a breeze. Oracle Cloud Infrastructure offers the DevOps service, introduced in the Summer of 2021. This service comes with git style code repositories, build servers and build pipelines, artifact repositories as well as deployment pipelines. This session introduces OCI DevOps and demonstrates how software can be built and deployed on OKE Kubernetes, Compute Instance VMs and Oracle Functions. From simple source code an application is put in production without manual intervention in the build and deployment process.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
An introduction to docker; the concepts; how to use it and why. The presentation is mainly based on the following presentation by docker, but with added info about Docker Compose and Docker Swarm.
https://www.slideshare.net/Docker/docker-101-nov-2016
#container #docker #Trifork #TriforkSelected #GotoConf
Modern data lakes are now built on cloud storage, helping organizations leverage the scale and economics of object storage while simplifying overall data storage and analysis flow
OpenShift 4, the smarter Kubernetes platformKangaroot
OpenShift 4 introduces automated installation, patching, and upgrades for every layer of the container stack from the operating system through application services.
In this session, we will learn about Teamcity CI Server. We will look at the different options available and how we can set a CI pipeline using Teamcity.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
Learn how you can use Cloudera Impala to:
- Operate with all data in your domain
- Address cyber security analysis and forensics needs
- Combat fraud, waste, and abuse
With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities.
The Document provides an overview of
the key security challenges in Big Data (Apache Hadoop)systems, and showcases the solutions used by Hortonworks Distribution to solve these security challenges.
Discusses cyber-security fears and the risks to your data in the cloud, an overview of cloud and virtualized infrastructures, open-source products, and security application, and lastly, methods for protecting databases.
The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesTrend Micro
Dave Asprey, VP-Cloud Security of Trend Micro presented to members of the SDforum in Jan. 2011. This is an adapted version of is presentation which covers key considerations addressing data privacy concerns in the Cloud.
In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
Co-presentation with Brian Marshall, Mark Wilson, and Chad Rikansrud at SHARE Atlanta, 2016. - Discussing The various approaches to mainframe security and hacking.
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureMongoDB
Your MongoDB Community Edition database can probably be a lot more secure than it is today, since Community Edition provides a wide range of capabilities for securing your system, and you are probably not using them all. If you are worried about cyber-threats, take action reduce your anxiety!
Hadoop and Big Data are often used in one meaning. I held this keynote at the BigDataWorld Frankfurt. So short, Hadoop and BigData aren't dead, but the paragigm shifts again
Cloud native and hybrid platforms are a game changer in the upcoming IoT enabled digital world. Here’s what we have experienced and what a possible tech stack could look like.
Decentral Energy Distribution in a Digital World. The decentral energy world is complex, volatile, event-driven, real-time, customer focused and only feasible through digitalisation. This only happen by using methodologies for future-ready tools, mindsets and collaboration across European regions and units.
Cloudera Impala - HUG Karlsruhe, July 04, 2013Alexander Alten
Low latency data processing with Impala
Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), JDBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Sentry - An Introduction
1. Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera
Wednesday, 7th November 2013
2. Defining Security Func/ons
Perimeter
!
!
!
!2
Data
Access
Visibility
Guarding
access
to
the
cluster
itself
Protec3ng
data
in
the
cluster
from
unauthorized
visibility
Defining
what
users
and
applica3ons
can
do
with
data
Repor3ng
on
where
data
came
from
and
how
it’s
being
used
Technical
Concepts:
Authen3ca3on
Network
isola3on
!
!
Technical
Concepts:
Encryp3on
Data
masking
!
!
Technical
Concepts:
Permissions
Authoriza3on
!
!
Technical
Concepts:
Audi3ng
Lineage
3. Enabling Enterprise Security
Perimeter
!
!
!
Data
Access
Visibility
Guarding
access
to
the
cluster
itself
Protec3ng
data
in
the
cluster
from
unauthorized
visibility
Defining
what
users
and
applica3ons
can
do
with
data
Repor3ng
on
where
data
came
from
and
how
it’s
being
used
Technical
Concepts:
Authen3ca3on
Network
isola3on
Kerberos
|
Oozie
|
Knox
!
!
Technical
Concepts:
Encryp3on
Data
masking
Cer3fied
Partners
!
!
Technical
Concepts:
Permissions
Authoriza3on
Sentry
Available
7/23
!3
!
!
Technical
Concepts:
Audi3ng
Lineage
Cloudera
Navigator
4. Hive Overview
SQL
Access
to
Hadoop
§
§
MapReduce:
great
massively
scalable
batch
processing
framework;
required
development
for
each
new
job
Hive
opened
up
Hadoop
for
more
users
with
standard
SQL
!
Key
Challenges
§
§
Batch
MapReduce
too
slow
for
interac3ve
BI/analy3cs
No
concurrency,
no
security
!
OpEons
Today
§
§
!4
Impala
designed
for
low-‐latency
queries
HiveServer2
delivers
concurrency,
authen3ca3on
5. Our OpenSource ac/vity
CDH
4.1
(HiveServer2)
§
§
Concurrency
and
Kerberos
authen3ca3on
for
Hive
JDBC
and
Beeline
clients
CDH
4.2
§
§
§
HDFS
impersona3on
authoriza3on
as
stop-‐gap
Pluggable
authen3ca3on
API
JDBC
LDAP
username/password
ODBC
§
§
!5
Supports
Kerberos
authen3ca3on
and
LDAP
Extended
partner
cer3fica3on
6. Current State of Authoriza/on
Two
Sub-‐OpEmal
Choices
for
SQL
on
Hadoop
Insecure
Advisory
Authoriza3on
Users
can
grant
themselves
permissions
Intended
to
prevent
accidental
dele3on
of
data
Problem:
Doesn’t
guard
against
malicious
users
HDFS
Impersona3on
Data
is
protected
at
the
file
level
by
HDFS
permissions
Problem:
File-‐level
not
granular
enough
Problem:
Not
role-‐based
!6
7. Authoriza/on Requirements
Secure
Authoriza3on
Ability
to
control
access
to
data
and/or
privileges
on
data
for
authen3cated
users
Fine-‐Grained
Authoriza3on
Ability
to
give
users
access
to
a
subset
of
data
(e.g.
column)
in
a
database
Role-‐Based
Authoriza3on
Ability
to
create/apply
templa3zed
privileges
based
on
func3onal
roles
Mul3-‐Tenant
Administra3on
Ability
for
central
admin
group
to
empower
lower-‐level
admins
to
manage
security
for
each
database/schema
!7
8. The Next Step: Introducing Sentry
AuthorizaEon
module
for
Hive
&
Impala
Unlocks
Key
RBAC
Requirements
Secure,
fine-‐grained,
role-‐based
authoriza3on
Mul3-‐tenant
administra3on
Open
Source
Intent
to
donate
to
ASF
Available
and
Fully
Supported
Hiveserver2
&
Impala
1.1
ini3ally
!8
9. Key Benefits of Sentry
Store
Sensi3ve
Data
in
Hadoop
Extend
Hadoop
to
More
Users
Enable
New
Use
Cases
Enable
Mul3-‐User
Applica3ons
Comply
with
Regula3ons
!9
10. Key Capabili/es of Sentry
Fine-‐Grained
Authoriza3on
Specify
security
for
SERVERS,
DATABASES,
TABLES
&
VIEWS
Role-‐Based
Authoriza3on
SELECT
privilege
on
views
&
tables
INSERT
privilege
on
tables
TRANSFORM
privilege
on
servers
ALL
privilege
on
the
server,
databases,
tables
&
views
ALL
privilege
is
needed
to
create/modify
schema
Mul3-‐Tenant
Administra3on
Separate
policies
for
each
database/schema
Can
be
maintained
by
separate
admins
!10
11. Apache Ecosystem and Sentry
Shared
Hive
Metastore
(with
HCatalog)
Extensibility
plug-‐in
for
HiveServer2
Inline
support
in
Impala
1.1
Poten3al
extension
to
Pig,
MapReduce,
REST
Hive Metastore
HCatalog
M
!11
Sentry
Possible
future
development
RE
13. Query Execu/on Flow
SQL
Parse
Validate
SQL
grammar
Build
Construct
statement
tree
Check
Validate
statement
objects
• First
check:
Authoriza3on
Forward
to
execu3on
planner
Plan
MR
!13
Sentry
Query
14. Example Security Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1,
# Defines the location of the per DB policy file for
server=server1->uri=hdfs://ha-nn-uri/
the
landing/jranalyst1
# ‘customers’ DB (schema)
customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the
global policy
# file even though ‘customers’ has its only policy
[groups]
file.
# Assigns Hadoop groups to their respective set of
# Note that the privileges from both the global
roles
policy file and
manager = analyst_role, junior_analyst_role
# the per-db policy file are merged. There is no
analyst = analyst_role
overriding.
jranalyst = junior_analyst_role
customers_admin_role = server=server1->db=customers
customers_admin = customers_admin_role
admin = admin_role
# Role controls everything on server1.
admin_role = server=server1
[roles]
# Roles that can import or export data to the the URIs
defined,
# i.e. a landing zone. Since the server runs as the
user "hive,"
# files in this directory must either have the “hive”
group set
# with read/write or be set world read/write.
analyst_role = server=server1->db=analyst1,
server=server1->db=jranalyst1->table=*>action=select
server=server1->uri=hdfs://ha-nn-uri/landing/
analyst1
(Continued on next column)
!
!
!
!
!
# Role controls everything for the ‘customers’ DB on
server1.
!14
!
15. Live Demo & Give Aways
Closes
gap
between
HDFS
and
Metastore
Easy
to
implement
RFC
2307
compilant
(Kerberos)
Enable
Mul3-‐User
Applica3ons
in
one
Hive
WH
Enables
Mul3
Tendency
per
Row
and
Column
!15