Apache Hive authorization models


Published on

Apache Hive has different models of authorization that you can use based on the use case you have. Also discusses how to setup and configure hive to use appropriate authorization models.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Two sources of truth - You grant permissions, but file system permissions are not set.You remove permissions, but user can still access files.
  • Apache Hive authorization models

    1. 1. © Hortonworks Inc. 2011 Hive Authorization Models Thejas Nair thejas@hortonworks.com @thejasn Page 1
    2. 2. © Hortonworks Inc. 2011 Authentication vs Authorization • Authentication –Verifying your identity –Enabled in Hadoop using Kerberos • Authorization –Verifying if you have permissions to perform this action Page 2 Architecting the Future of Big Data Pic1 – http://www.flickr.com/photos/matsuyuki/2906448025/ Pic2 - http://www.flickr.com/photos/86818962@N00/3209747460http://www.flickr.com/photos /matsuyuki/2906448025/
    3. 3. © Hortonworks Inc. 2011 Hive architecture Page 3 Architecting the Future of Big Data Hive client Metastore server RDBMS HDFS Map Reduce What are we trying to protect here ? Data Metadata
    4. 4. © Hortonworks Inc. 2011 Actions controlled by authorization Page 4 Architecting the Future of Big Data • Metadata operations - Access/changes to RDBMS storing the metadata • Storage operations - create, write, read operations - Storage (HDFS) comes with its own authorization, the challenge is protecting the metadata.
    5. 5. © Hortonworks Inc. 2011 Existing models of authorization Page 5 Architecting the Future of Big Data 1. Traditional RDBMS style authorization –Use Case: Hive is like an RDMBS, managing its own data 2. Storage based authorization –Use Case: Hadoop is providing shared storage, Hive is one of the tools to use this –HCatalog world view 3. No Authorization –Makes sense in prototype or single user case –Metadata is not protected
    6. 6. © Hortonworks Inc. 2011 Traditional RDBMS style authorization Page 6 Architecting the Future of Big Data • Use grant, revoke statements to manage permissions • Store permissions in Metastore RDBMS • But HDFS authorization is separate –Two sources of truth! –HDFS permissions can still grant access • Problems sharing the stored data with other tools
    7. 7. © Hortonworks Inc. 2011 Traditional RDBMS style authorization Page 7 Architecting the Future of Big Data • Hive is only tool - use case –Disable all other tools, set 777 permissions to HDFS files? –Easy to bypass Hive authorization –Hive allows arbitrary code in UDFs, or Hive streaming code –You still need to manage HDFS file permissions • Permission model is incomplete –HIVE-3720 has a new proposal • Does not protect against malicious users
    8. 8. © Hortonworks Inc. 2011 Storage based authorization model Page 8 Architecting the Future of Big Data • Use HDFS/storage permissions as only source of truth –Works well if you have other systems accessing the data • eg. Table directory permissions determine table permissions –To alter table metadata you need write permissions on table directory • Problem: Hive concepts such as columns and views don't map to files. –Coarse vs fine grained authorization
    9. 9. © Hortonworks Inc. 2011 Potential solution Page 9 Architecting the Future of Big Data • Combine the two models? –Add HDFS permission verification/management to a traditional RDMBS style authorization –Use grant/revoke on file system user and groups –Tables populated by external tools can be marked as ‘external’ – Hive does not manage index, statistics –(personal opinion – need to make detailed proposal)
    10. 10. © Hortonworks Inc. 2011 Hive secure setup - Metastore Page 10 Architecting the Future of Big Data • Don’t trust end clients • Standalone metastore server to protect access to metastore RDBMS –Set hive.metastore.uris in client • Have metastore do actions as user –hive.metastore.execute.setugi=true in client and server –Creates files as the user • Enable verification on metastore (hive 0.10) (HIVE-3705) hive.metastore.pre.event.listeners=org.apache.hadoop.hive.ql.security.authorizatio n hive.security.metastore.authenticator.manager=org.apache.hadoop.hive.ql.security. HiveMetastoreAuthenticationProvider hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security. authorization.HiveMetastoreAuthorizationProvider
    11. 11. © Hortonworks Inc. 2011 Hive secure setup – auth setup Page 11 Architecting the Future of Big Data • Turn on authorization! • hive.security.authorization.enabled=true
    12. 12. © Hortonworks Inc. 2011 Setting RDBMS style authorization Page 12 Architecting the Future of Big Data • This is the default model • Set hive.security.authorization.createtable.owner.grants=ALL
    13. 13. © Hortonworks Inc. 2011 Setting storage based authorization Page 13 Architecting the Future of Big Data • Use custom authorization manager StorageBasedAuthorizationProvider hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorizati on.StorageBasedAuthorizationProvider • Available in hive since 0.10 • Available in hcatalog earlier –export HIVE_AUX_JARS_PATH=<hcatalog jar location> –hive.security.authorization.manager= org.apache.hcatalog.security.HdfsAuthorizationProvider
    14. 14. © Hortonworks Inc. 2011 Other possibilities Page 14 Architecting the Future of Big Data • AccessServer proposal based on HiveServer2 –Clients use JDBC to talk to server that can serve queries from Hive, Pig or other tools –Server restricts what can be run –Use improved version of traditional RDBMS style auth –Would require UDFs, serdes to be blessed by a Hive DBA –Disallow arbitrary streaming commands?
    15. 15. © Hortonworks Inc. 2011 Further reading Page 15 Architecting the Future of Big Data • https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ Authorization • https://cwiki.apache.org/confluence/display/HCATALOG/Storage+Ba sed+Authorization • https://cwiki.apache.org/confluence/display/Hive/AccessServer+Desi gn+Proposal • HIVE-3705 - Adding authorization capability to the metastore • HIVE-3720 - Expand and standardize authorization in Hive