Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera

Secure Solr With Apache Sentry
Gregory Chanan, Engineer @ Cloudera
gchanan AT cloudera.com

Who Am I?
•  Software Engineer at Cloudera
•  Apache Solr Committer
•  Apache Sentry Committer (incubating)
•  Apache HBase Committer

Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work

Why Security?
•  Apache Solr only provides minimal security features
“Solr
allows
any
client
with
access
to
it
to
add,
update,
and
delete
documents

(and
of
course
search/read
too),
including
access
to
the
Solr
configura<on
and

schema
files
and
the
administra<ve
user
interface.”[1]

•  In the past, deployed as a single server
“It
is
strongly
recommended
that
the
applica<on
server
containing
Solr
be
firewalled
such

the
only
clients
with
access
to
Solr
are
your
own.”
[1]

Why Security?
•  SolrCloud driving adoption in Big Data space
•  Now, a component of a multi-tenant Hadoop cluster
•  Non-‐solr
users
on
cluster

•  Solr
communicates
across
machines
and
services

Why Apache Sentry?
•  Sentry already established in Hadoop ecosystem
•  Has
understood
authen<ca<on
model
(kerberos)

•  Has
understood
privilege/ac<on
model

•  Security-focused project
•  Solr
focus
on
Search
Engine

•  Sentry
focus
on
Security

Authentication
•  Authentication: Verifying identity of a user or service
•  Solr supports authenticating with dependent services (i.e. HDFS
and ZooKeeper*)
•  Sentry goal: support other services / users authenticating with
Solr
•  Consistent with other HTTP-level Hadoop services (e.g. Oozie
and HttpFs), Apache Sentry uses:
•  Kerberos: a mutual authentication protocol that works on the
basis of “tickets”
•  SPNego: a negotiation mechanism for selecting an underlying
authentication protocol

SPNego advantages
•  HTTP Tools have built-in support for SPNego/Kerberos
•  Web browsers
•  curl (with --negotiate)
•  HTTP libraries, including Apache HttpClient (used by solrj)
•  Although an authentication (not authorization) protocol, can be
used for cluster-level access control
•  Only grant kerberos credentials to users who should have access to the cluster

Authentication Setup
•  Server side: use Sentry-provided web.xml which has a kerberos/
SPNego aware filter
•  Have
to
setup
keytabs/principals/JAAS
conﬁgura<ons

•  Client side: Sentry provides HttpClient / HttpSolrServer
configuration for communicating with kerberos/SPNego aware
Solr servers
•  Have
to
setup
keytabs/principals/JAAS
conﬁgura<ons

•  Cloudera Manager can do setup for you

Authorization
•  Authorization: Controlling access to resources
•  Solr does not provide collection/document authorization support
•  Does support “hooks” via solr.xml and solrconfig.xml to override
request handler implementation
•  Sentry uses these “hooks” to implement collection and document level
authorization

Collection-level Authorization
•  Sentry supports role-based granting of privileges
•  each
role
can
be
granted
QUERY,
UPDATE,
and/or
administra<ve
privileges

on
an
collec<on

•  Privileges stored in a “policy file” on HDFS:
[groups]

#
Assigns
each
Hadoop
group
to
its
set
of
roles

dev_ops
=
engineer_role,
ops_role

[roles]

#
Assigns
each
role
to
its
set
of
privileges

engineer_role
=
collec<on
=
source_code-‐>ac<on=Query,

collec<on
=
source_code
-‐>
ac<on=Update

ops_role
=
collec<on
=
hbase_logs
-‐>
ac<on=Query

Integrating Sentry and Solr
•  Sentry integrated via “hooks” in request handlers:
•  Specified per collection in solrconfig.xml:
•  Sentry ships with its own version of solrconfig.xml with secure handlers,
called solrconfig.xml.secure

Administrative requests
•  That covers queries/updates of collections, but what about administrative
actions such as getting the status of the cores?
•  In SolrCloud, admin looks like a collection:
http://localhost:8983/solr/admin/cores?action=STATUS
•  Can just follow this structure in Sentry:
sample_role
=
collec<on
=
admin-‐>ac<on=Query,

•  Secure Admin Handlers controlled via cluster-wide “solr.xml” in
ZooKeeper. By default, you get Secure Admin Handlers if Sentry is
enabled

Administrative requests
•  Full privilege model documented here
•  Examples (colllection1 = arbitrary collection name):
Ac-on
Required
Privilege
Collec-on

select
QUERY
collec<on1

update/json
UPDATE
collec<on1

ThreadDumpHandler
QUERY
admin

Document-level authorization motivation
•  Collection-level authorization useful when access control requirements
for documents are homogeneous
•  Security requirements may require restricting access to a subset of
documents
•  Consider “Confidential” and “Secret” documents. How to store with only
collection-level authorization?
•  Pushes complexity to application

Document-level authorization model
•  Instead of Policy File in HDFS:
[groups]

#
Assigns
each
Hadoop
group
to
its
set
of
roles

dev_ops
=
engineer_role,
ops_role

[roles]

#
Assigns
each
role
to
its
set
of
privileges

engineer_role
=
collec<on
=
source_code-‐>ac<on=Query,

collec<on
=
source_code-‐>ac<on=Update

ops_role
=
collec<on
=
hbase_logs-‐>ac<on=Query

•  Store authorization tokens in each document
•  Many
more
documents
than
collec<ons;
doesn’t
scale
to
store
document-‐
level
info
in
Policy
File

•  Can
use
Solr’s
built-‐in
ﬁltering
capabili<es
to
restrict
access

Document-level authorization model
•  A configurable field stores the authorization tokens
•  The authorization tokens are Sentry roles, i.e. “ops_role”

[roles]

ops_role
=
collec<on
=
hbase_logs-‐>ac<on=Query

•  Represents the roles that are allowed to view the document. To
view a document, the querying user must belong to at least one
role whose token is stored in the token field
•  Can modify document permissions without restarting Solr
•  Can modify role memberships without reindexing

Document-level authorization impl
•  Intercepts the request via a SearchComponent
•  SearchComponent adds an “fq” or FilterQuery
•  Filter
out
all
documents
that
don’t
have
“role1”
or
“role2”
in
authField

•  Filters are cached, so only construction expense once
•  Note: does not supersede collection-level authorization

Document-level authorization config
•  Configuration via solrconfig.xml.secure (per collection):

<!-‐-‐
Set
to
true
to
enabled
document-‐level
authoriza<on
-‐-‐>

<bool
name="enabled">false</bool>

<!-‐-‐
Field
where
the
auth
tokens
are
stored
in
the
document
-‐-‐>

<str
name="sentryAuthField">sentry_auth</str>

<!-‐-‐
Auth
token
deﬁned
to
allow
any
role
to
access
the

document.

Uncomment
to
enable.
-‐-‐>

<!-‐-‐<str
name="allRolesToken">*</str>-‐-‐>

•  No tokens = no access. To allow all users to access a document,
use the allRolesToken. Useful for getting started

Secure Impersonation
•  But wait! My users don’t interact with Solr directly
•  Custom web UI, load balancer, etc.
•  Authorization won’t work!
•  “user” is forgotten, request to Solr from “UI”

Secure Impersonation
•  Secure impersonation: the ability of a “super-user” to submit
requests on behalf of another user
•  Conceptually
similar
to
“sudo”
on
Unix

•  Limited
to
only
groups/hosts
that
are
explicitly
conﬁgured
to
support
it

•  Iden<cal
to
func<onality
provided
by
HDFS,
Oozie

Hue Search App UI
•  Uses Secure Impersonation to integrate with its own security mechanisms
•  Users
can
login
to
Hue
via
LDAP
or
other
auth
mechanism

•  Hue
makes
requests
on
behalf
of
logged
in
user

•  Only
Hue
user
requires
kerberos
keytab

•  Seamlessly integrates with the collection and document-level access control
mechanisms

Performance Testing
•  Goal is to measure overhead of:
•  Kerberos Authentication
•  Sentry Collection-Level Authorization
•  Measure index, query overhead separately

Index Test Setup
•  20-node cluster: 12 cores, 96 GB RAM, 12x 2TB disks, 10G Ethernet
•  Cloudera Search-1.2.0, CDH 4.6, MR1, CentOS 6.4
•  260M tweets/docs, indexed across 17 fields
•  116 GB, ~800 JSON .gz files, ~130MB per file, 3-fold HDFS
replication
•  1 Solr server and 1 shard per node (44M docs per shard), no Solr
replication
•  Uses MapReduceIndexerTool contrib. mapper/reducer slots = 2x/1x
number of cores
•  Solr heap size = 20GB
•  Record end-to-end indexing time, i.e., indexing + mtree merge + go
live
•  Record average from 3 repeats

Index Performance Testing
•  Leg
column
is
unsecured

baseline.

•  Center
column
is
~20%

lower
→
HDFS
security

introduces
~20%

performance
overhead.

•  Right
column
is
~same
as

center
column
→
Solr

security
introduces
no

addi<onal
overhead.

Query Test Setup
•  Same setup as MapReduce batch indexing
•  Uses the output of MapReduce batch indexing
•  1 client, 30 threads per client
•  Uses internal tool - QueryRunner
•  Similar
to
SolrMeter
and
JMeter

•  Query randomly sampled from fixed set of 10,000 strings
•  Record per thread query throughput for 5 runs of 30 min each

Query Performance Testing
•  Leg
column
is
unsecured

baseline.

•  Center
column
is
~13%

lower
→
HDFS
security

introduces
~13%

performance
overhead.

•  Right
column
is
same
as

center
column
→
Solr

security
introduces
no

addi<onal
overhead.

Future Work
•  Support for Sentry service with improved APIs / performance /
integration
•  Already supported for Hive/Impala
•  Currently in development upstream
•  “Lineage” security: data flows from one system to another and
retains security criteria
•  Example: Index HBase data for full-text queries in Solr. HBase Table
and Cell-level security tags automatically applied to Solr Collections,
Documents, and Fields

Questions?
•  Thanks for listening!
•  More information / Want to contribute?
http://sentry.incubator.apache.org/
•  Questions?

Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera

Similar to Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera