SECURE YOUR DATA IN
HADOOP
Current state of security, approach for
comprehensive strategy
1

CONTENTS
Introduction ....................................................................................................
2

INTRODUCTION
Big data is emerging as the next technology wave and enterprises across different
industries are adopting ...
3

HADOOP- SECURITY
Hadoop was developed to process massive amounts of disparate data using
commodity hardware. From its i...
4

WORK TO BE DONE IN H ADOOP

There is a long way to go before Hadoop can meet the exacting security standards in
large e...
5

c) Common extensive audit layer across Hadoop components. Audit can be set at
resource and user group level.
d) Delegat...
Upcoming SlideShare
Loading in …5
×

XA Secure | Whitepaper on data security within Hadoop

764
-1

Published on

Enterprises adopting Hadoop and other big data tools need to ensure that they the data they are storing and processing is internally protected through strong access control, auditing and governance. This whitepaper talks to current challenges with Hadoop, the initiatives within the open source community and how XA Secure can help with its approach.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
764
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XA Secure | Whitepaper on data security within Hadoop

  1. 1. SECURE YOUR DATA IN HADOOP Current state of security, approach for comprehensive strategy
  2. 2. 1 CONTENTS Introduction ........................................................................................................................................................ 2 Big data- What is happening?........................................................................................................................ 2 Hadoop- Security .............................................................................................................................................. 3 Current Hadoop Security Features/Initiatives .......................................................................................... 3 Work to be done in Hadoop ....................................................................................................................... 4 XA Secure - Big Data Security Approach ..................................................................................................... 4 XA Secure differentiators ............................................................................................................................. 5 Summary ............................................................................................................................................................. 5 www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  3. 3. 2 INTRODUCTION Big data is emerging as the next technology wave and enterprises across different industries are adopting tools such as Hadoop. While there are efficiencies in processing varied and distributed data, big data presents a unique challenge for managing information security. BIG DATA- WHAT IS HAPPENING? Digital data is everywhere and global data is growing at 40% per year. Companies are capturing trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedded in the physical world in devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating data. By collecting and analyzing all this information, companies can gain insight into new business opportunities and threats. To harness the ever expanding data volumes, new technologies have emerged to enable processing of massive sets of data in a technique called massive parallel processing (mpp). In a recent survey by Talend, it was found 60% of companies looking at big data are considering open source Apache Hadoop or Hadoop based distributions. From its initial development to supporting Yahoo’s increasing search and web management needs, Hadoop has emerged as the leading platform to support big data analytics applications. Hadoop software market itself is predicted to be around $813 million by 2016(IDC research). Enterprises are moving to a phase whether they have completed pilot or proof of concept work and embracing Hadoop to solve core business needs in production. At the same time, organizations are trying to analyze different kinds of data, from web logs, social media streams to sales and customer information to get better insights. With Hadoop, they are able to achieve this at a fraction of a cost compared to traditional data warehouses. There is a movement towards creating large data lakes or data hubs where enterprise wide can be stored and processed using Hadoop. Therein presents the risk of data security, as data moves from protected walls of enterprise applications to the kitchen sink called Big Data. Organizations need to provide the same level of security across their organization. Data within big data initiatives are no exception. www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  4. 4. 3 HADOOP- SECURITY Hadoop was developed to process massive amounts of disparate data using commodity hardware. From its initial success in Yahoo, it has matured as an application to support various verticals. However, the security controls inside Hadoop are very basic and still evolving. CURRENT HADOOP SECURITY FE ATURES/INITIATIVES Given the security challenges, there has been lot of work being undertaken within the open source and vendor community to enable Hadoop to be a more secure environment. We have summarized some of the important initiatives Kerberos Authentication: As one of the first steps towards security, Kerberos authentication was introduced in Hadoop in 2008 to add a basic level of security that was missing before and today it is the primary method for providing secure authentication in Hadoop. Kerberos is a computer network authentication protocol which works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. Kerberos authentication enables the MapReduce jobs or Namenode tracker in Hadoop to authenticate the user and enabling permissions based on that Access Control Lists (ACLs): In core HDFS, file permissions are similar to permission in a UNIX system. Read-write access is maintained for each user groups which are basically a string of characters. At the MapReduce level, which users can be used to submit jobs can be defined by MapReduce ACLs. The list of users groups can be maintained with the Hadoop layer or can be configured to get it from external LDAP or Active directory systems. HBase ACLs were introduced from HBase 0.92 onward and gives the ability to define authorization policy (Read/Write/Create/Admin), with table/family/qualifier granularity, for a specified user. Sentry (Cloudera): Cloudera recently introduced role-based authorization framework which provides access to user and groups over Hive and Cloudera’s Impala. The authorization framework uses a file based policy provider and can be configured at multiple levels i.e., server, database, table, column etc. Project Knox (Hortonworks): Project Knox from Hortonworks is currently focused on providing a gateway to the Hadoop clusters, to provide a single point of authentication and access for Apache Hadoop services in a cluster. Features planned include providing perimeter security for Hadoop, single cluster end point for data and jobs, management of security across multiple clusters and Hadoop versions among other areas. The initiative, started in 2013, has already delivered couple of releases www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  5. 5. 4 WORK TO BE DONE IN H ADOOP There is a long way to go before Hadoop can meet the exacting security standards in large enterprises. Despite the current work, there are still some challenges for CIOs and CISOs adopting the Hadoop stack, including       No framework for managing enterprise policies. Large enterprises have complex and constantly evolving policies for managing data access. The native Hadoop framework does not offer an easy framework for customizing and managing employee policies. Fine grained authorization. The current authorization lets user or user groups get access to tables or file systems/directories as a whole. Enterprises are looking for more fine grained authorization to ensure sensitive data is protected from access while still be able to analyze complete set of data and leveraging its full potential Decentralizing data ownership. As the use of Hadoop expands in the organization, business units would still want to retain control of their data and provide access themselves to users from other units. Lack of uniform authorization method. While HBase uses ACLs for managing authorization, HDFS nodes refer to its own set of groups defined for vetting authorization. Enterprises are looking for a universal process for authorization across all components. Lack of universal audit control mechanism. Currently each component is built to have its own audit tracking mechanism and there is no uniformity in elements tracked or format of the audit log. Enterprise are looking for easy way of reporting access history of their employees Lack of reporting and governance capabilities. Enterprises would need tools to readily report policy status, access history and check compliance conformance across various assets. XA SECURE - BIG DATA SECURITY APPROACH At XA Secure, we recognize these challenges for Hadoop and other big data tools, and are trying to solve them through our solution offerings. Our initial product is completely built ground up for the big data infrastructure. We are trying to address some of the security challenges with Hadoop infrastructure by providing a governance layer to enable a) Centralized policy management with ability to define policies for fine grained access controls to files (HDFS), column families, cells (Hbase, Hive) etc, Differentiated views of data based on user function b) Protect sensitive data through masking and encryption www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  6. 6. 5 c) Common extensive audit layer across Hadoop components. Audit can be set at resource and user group level. d) Delegated administration of data e) Policy analytics to monitor and report access, enable compliance conformance The tool is currently built over HBase, Hive and HDFS components with planned incorporation of other big data tools, such as Greenplum, Mongo DB, in the future releases. XA SECURE DIFFERENTIATORS As noted before, there is a lot of work being done in making Hadoop more secure and at XA Secure, we continue to work with the open source community in leveraging the collective work and delivering value to our customers. As a company with a rich history in security and identity management, and being purely focused on big data, we believe we bring unique value proposition through our offerings, which includes a) An end to end complete access management and governance suite over Hadoop. We focus on making it easier for both business users as well as administrators to manage data security over Hadoop b) Distribution agnostic solution. We support most of the prevalent Hadoop distributions and can easily integrate into management tools that come as part of the distribution c) Hooks to integrate with enterprise’s existing provisioning or access management systems. We currently integrate with LDAP, and also support import and export of our policies d) Industry specific compliance and audit reports. We are building support for government, financial and healthcare compliance requirements. e) Leverage and built over current open sources efforts on authentication and encryption. We will continue to embed other open source initiatives as they are released SUMMARY The big data ecosystem is evolving and there are a lot of initiatives in the open source and vendor community for building mature capabilities, It is important that enterprises embed security strategy as part of their plan early and think about what data they would put into big data tools and how they are going to extend the security controls over the data. CISOs can choose to adopt XA Secure’s solution to provide enterprise level security and credibility to their big data initiatives. www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538

×