Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 

Big Data Cloud Meetup - Jan 24 2013 - Zettaset

on

  • 1,101 views

Security is the greatest challenge for the widespread adoption of Hadoop in enterprises. ...

Security is the greatest challenge for the widespread adoption of Hadoop in enterprises.

This meetup will discuss ways and means of how such challenges are being met with various solutions and/or products in the industry today. Industry security experts will showcase their varied experiences.

Statistics

Views

Total Views
1,101
Slideshare-icon Views on SlideShare
916
Embed Views
185

Actions

Likes
0
Downloads
20
Comments
0

3 Embeds 185

http://www.bigdatacloud.com 182
https://twitter.com 2
http://131.253.14.98 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • What does Zettaset do?
  • Data capacity on average in enterprises is growing at 40% to 60% year over year due to a number of factors, including an explosion in unstructured data.” - Computerworld, 2010“80 percent of data is unstructured.” - IBM, 2010
  • IBM BigInsights includes proprietary applications as part of its EE distribution only. These include Jaql, Jaqlserver, Workflow, BigSheets, and LanguageWare
  • Don’t tell them it’s secure just because it’s behind a fire wall either… it didn’t work in 1995 and it dosent always work now.
  • Before you can talk about securing big data, you have to understand your data use case

Big Data Cloud Meetup - Jan 24 2013 - Zettaset Big Data Cloud Meetup - Jan 24 2013 - Zettaset Presentation Transcript

  • Creating a Secure Hadoop Initiative Securing the Big Data EcosystemThis document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may notreview, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.
  • About Me• CTO Zettaset, Inc. – Big Data Hadoop Company – Founded 2007• Distributed Computing Guy – Have been since college• Security Guy – Founder SPI Dynamics (sold to HP, 2007) – Internet Security Systems, Prof. Services – Security First Network Bank, Sec. Guru.
  • Zettaset Enables Enterprise-Ready Hadoop• Zettaset Orchestrator™ automates Hadoop installation and cluster management with an enterprise-ready solution for Big Data deployments – Enterprise-class – Hardened for security, high availability, and performance – Dramatically lowers operational expenses – Reduces IT resource requirements – Simple to deploy – Accelerates time to value from weeks to hours – Eliminates unnecessary dependencies on professional services – Works with any Apache Hadoop distribution 3 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • Zettaset Orchestrator:Making Hadoop Clusters Enterprise-Ready 4 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • What is Big Data?• Great Question• It’s not a number, people define it differently.• Majority define it as a scalability issue: – “The inability to continue storing and processing data the way that you’ve been storing and processing data.”
  • Exponential Data Growth = Big Data Estimated Global Data Volume:  2011: 1.8 Zettabytes  2015: 7.9 Zettabytes The worlds information doubles every two years Over the next 10 years:  The number of servers worldwide will grow by 10x  Amount of information managed by enterprise data centers will grow by 50x  Number of “files” enterprise data center handle will grow by 75x Source: http://www.emc.com/leadership/programs/digital- universe.htm, which was based on the 2011 IDC Digital Universe Study6
  • Hadoop Distribution Landscape Distribution & Core Apache BigInsights BigInsights CDH3u4 CDH4u0 HDP v1.0 MapR M3 MapR M5 Components Bigtop v0.3 1.4 BE 1.4 EE Apache Hadoop         HDFS         Open Source Fuse-DFS     - - - - Apache Hadoop MapReduce         MapReduce 2 -  - - - - - - Proprietary Hadoop Common     - -   Apache Hive         Apache Pig         Apache HBase         Apache Zookeeper     - -   Apache Ambari - -  - - - - - Apache Templeton - -  - - - - - Apache Flume   -      Apache Sqoop       - - Apache Mahout   -    - - Apache Whirr   -    - - Apache Oozie         Apache Lucene - - - - - -   Apache Derby - - - - - -   Apache Avro - - - - - -   Hue   - - - - - - BigInsights Apps - - - - - - -  Hadoop Management Nagios - -  - - - - - Ganglia - -  - - - - - Zettaset Orchestrator™         Cloudera Manager   - - - - - - MapR Manager - - - -   - - BigInsights web console - - - - - - -  BigInsights simple console - - - - - -  - 7January 24, 2013 Zettaset, Inc. | Proprietary
  • What is the current state of Security?• Another Great Question• Minimal work has been done in this field• Currently Not a Huge Community Focus.• Everyone feels like it’s been addressed by adding Kerberos to the systemDon’t tell InfoSec People the Kerberos has fixed everything!
  • Why Not Tell Them That?• You will give them an aneurysm.• Kerberos is “Brushed On” Security NOT “Baked In” security.• Kerberos does NOT address compliance issues around data (HIPAA, GLBA, PCI, Etc.) Nothing around encryption, nothing around best practices.
  • Hadoop: What’s Missing?• All Hadoop distros are constrained by the limitations of the Apache open source components• Not written to support hardened security, compliance, encryption, po licy-enablement, and risk management• Not written with high availability, service management, and monitoring in mind 10 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • Current State of Hadoop Security• Existing security for Apache-based Hadoop distributions does not meet enterprise requirements to support regulatory compliance mandates such as HIPAA and SOX, for example• Security breaches can result in negative impact, e.g., release sensitive information, damage brand, compromise competitive advantage, spark litigation, etc.• Hadoop security mechanism provides mutual authentication of users and services via SASL and Kerberos, but this has limitations 11 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • Enterprise-Class Hadoop Security Addresses the security gaps and vulnerabilities that exist in all Apache-based Hadoop distributions• Hardened to address access control, policy, compliance and risk management• Support for Lightweight Directory Access Protocol (LDAP) and Active Directory (AD), enabling Hadoop clusters to seamlessly integrate with existing security policies within the enterprise environment• Centralized configuration management, logging, and auditing, which maintains control of ingress and egress points in the cluster, and enables Hadoop clusters to meet compliance requirements for reporting and forensics• Role-based access control (RBAC), which significantly improves the user authentication process, and enables Kerberos to be run against all components of a big data ecosystem, not just Hadoop 12 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • Defining Big Data Use Case• What is your use case?• What are you trying to accomplish?• What data are you going to be storing?• What are you going to do after you store it? This will define your Security Threat Model and how you protect your data.
  • Big Data Production System Log files Alerts Transactions etc. Structured Semi-structured Unstructured Types Data Data Data DataJanuary 24, 2013 Zettaset, Inc. | Proprietary and 14
  • Big Data Landscape (Version 2.0) Infrastructure Analytics Application NoSQL Databases Hadoop Related Analytics Solutions Data Visualization s Ad OptimizationNewSQL Databases Publisher Marketing Statistical Tools Computing Social Media MPP Management Cluster Industry ApplicationsDatabases / Monitoring Services Sentiment Analysis Analytics Services Security Application Service Providers Location / People / Big Data Search Events Crowdsourcin IT Analytics Data Storage g Data Sources Sources Data Collection / Real-Time Crowdsourc SMB Marketplace Transport ed Analytics Analytics s Cross Infrastructure / Personal Data Analytics Open Source ProjectsFramework Query / Data Access Coordination / Real - Statistical Machine Cloud Data Flow Workflow Time Tools Learning Deployme nt © Matt Turck (@mattturck) and Shivon Zilis (@shivonz) Bloomberg Ventures
  • What is a Threat Model?• Threat modeling is based on the notion that any system or organization has assets of value worth protecting and these assets have certain vulnerabilities.• Internal or external threats exploit these vulnerabilities in order to cause damage to the assets, and appropriate security countermeasures exist that mitigate the threats.• A threat model can help to assess the probability, the potential harm, the priority etc., of attacks, and thus help to minimize or eradicate the threats.
  • Approaches to threat modeling• 3 general approaches to threat modeling:• Attacker-centric – Attacker-centric threat modeling starts with an attacker, and evaluates their goals, and how they might achieve them. Attackers motivations are often considered, for example, "The NSA wants to read this email," or "Jon wants to copy this DVD and share it with his friends." This approach usually starts from either entry points or assets.• Software-centric – Software-centric threat modeling (also called system-centric, design-centric, or architecture-centric) starts from the design of the system, and attempts to step through a model of the system, looking for types of attacks against each element of the model. This approach is used in threat modeling in Microsofts Security Development Lifecycle.• Asset-centric – Asset-centric threat modeling involves starting from assets entrusted to a system, such as a collection of sensitive personal information.
  • Bottom Line• Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing• Assign risk values and determine the risk responses• Determine the countermeasures to implement based on your chosen risk responses• Continually update the threat model based on the emerging security landscape.
  • Summary • All existing Apache-based Hadoop distributions have functional limitations which constrain enterprise adoption • Zettaset Orchestrator is addressing the enterprise-level gaps in security, high availability, performance, and manageability that exist in all Apache-based Hadoop distributions • Orchestrator is a universal management and control software layer that can sit on top of any Hadoop distribution (distro- agnostic) • Orchestrator fills the Service Management gaps that exist in all Hadoop distributions and cluster deployments, and makes Hadoop ready for broader enterprise adoption 19 © 2012 Zettaset, Inc. | Proprietary and Confidential
  • Thank You !