Creating a Secure Hadoop                 Initiative                 Securing the Big Data EcosystemThis document contains ...
About Me• CTO Zettaset, Inc. – Big Data Hadoop Company   – Founded 2007• Distributed Computing Guy   – Have been since col...
Zettaset Enables Enterprise-Ready Hadoop• Zettaset Orchestrator™ automates  Hadoop installation and cluster  management wi...
Zettaset Orchestrator:Making Hadoop Clusters Enterprise-Ready     4             © 2012 Zettaset, Inc. | Proprietary and Co...
What is Big Data?• Great Question• It’s not a number, people define it  differently.• Majority define it as a scalability ...
Exponential Data Growth = Big Data                      Estimated Global Data Volume:                          2011: 1.8 ...
Hadoop Distribution Landscape         Distribution & Core                                        Apache                   ...
What is the current state of Security?•   Another Great Question•   Minimal work has been done in this field•   Currently ...
Why Not Tell Them That?• You will give them an aneurysm.• Kerberos is “Brushed On” Security NOT  “Baked In” security.• Ker...
Hadoop: What’s Missing?• All Hadoop distros are constrained  by the limitations of the Apache  open source components• Not...
Current State of Hadoop Security• Existing security for Apache-based Hadoop  distributions does not meet enterprise requir...
Enterprise-Class Hadoop Security      Addresses the security gaps and vulnerabilities that exist             in all Apache...
Defining Big Data Use Case• What is your use case?• What are you trying to accomplish?• What data are you going to be stor...
Big Data Production System                                                                  Log files                     ...
Big Data Landscape (Version 2.0)            Infrastructure                                   Analytics                    ...
What is a Threat Model?• Threat modeling is based on the notion that any  system or organization has assets of value worth...
Approaches to threat modeling•   3 general approaches to threat modeling:• Attacker-centric     – Attacker-centric threat ...
Bottom Line• Identify any threats to the confidentiality, availability and  integrity of the data and the application base...
Summary • All existing Apache-based Hadoop distributions have functional   limitations which constrain enterprise adoption...
Thank You !
Upcoming SlideShare
Loading in …5
×

Big Data Cloud Meetup - Jan 24 2013 - Zettaset

1,440 views

Published on

Security is the greatest challenge for the widespread adoption of Hadoop in enterprises.

This meetup will discuss ways and means of how such challenges are being met with various solutions and/or products in the industry today. Industry security experts will showcase their varied experiences.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,440
On SlideShare
0
From Embeds
0
Number of Embeds
247
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • What does Zettaset do?
  • Data capacity on average in enterprises is growing at 40% to 60% year over year due to a number of factors, including an explosion in unstructured data.” - Computerworld, 2010“80 percent of data is unstructured.” - IBM, 2010
  • IBM BigInsights includes proprietary applications as part of its EE distribution only. These include Jaql, Jaqlserver, Workflow, BigSheets, and LanguageWare
  • Don’t tell them it’s secure just because it’s behind a fire wall either… it didn’t work in 1995 and it dosent always work now.
  • Before you can talk about securing big data, you have to understand your data use case
  • Big Data Cloud Meetup - Jan 24 2013 - Zettaset

    1. 1. Creating a Secure Hadoop Initiative Securing the Big Data EcosystemThis document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may notreview, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.
    2. 2. About Me• CTO Zettaset, Inc. – Big Data Hadoop Company – Founded 2007• Distributed Computing Guy – Have been since college• Security Guy – Founder SPI Dynamics (sold to HP, 2007) – Internet Security Systems, Prof. Services – Security First Network Bank, Sec. Guru.
    3. 3. Zettaset Enables Enterprise-Ready Hadoop• Zettaset Orchestrator™ automates Hadoop installation and cluster management with an enterprise-ready solution for Big Data deployments – Enterprise-class – Hardened for security, high availability, and performance – Dramatically lowers operational expenses – Reduces IT resource requirements – Simple to deploy – Accelerates time to value from weeks to hours – Eliminates unnecessary dependencies on professional services – Works with any Apache Hadoop distribution 3 © 2012 Zettaset, Inc. | Proprietary and Confidential
    4. 4. Zettaset Orchestrator:Making Hadoop Clusters Enterprise-Ready 4 © 2012 Zettaset, Inc. | Proprietary and Confidential
    5. 5. What is Big Data?• Great Question• It’s not a number, people define it differently.• Majority define it as a scalability issue: – “The inability to continue storing and processing data the way that you’ve been storing and processing data.”
    6. 6. Exponential Data Growth = Big Data Estimated Global Data Volume:  2011: 1.8 Zettabytes  2015: 7.9 Zettabytes The worlds information doubles every two years Over the next 10 years:  The number of servers worldwide will grow by 10x  Amount of information managed by enterprise data centers will grow by 50x  Number of “files” enterprise data center handle will grow by 75x Source: http://www.emc.com/leadership/programs/digital- universe.htm, which was based on the 2011 IDC Digital Universe Study6
    7. 7. Hadoop Distribution Landscape Distribution & Core Apache BigInsights BigInsights CDH3u4 CDH4u0 HDP v1.0 MapR M3 MapR M5 Components Bigtop v0.3 1.4 BE 1.4 EE Apache Hadoop         HDFS         Open Source Fuse-DFS     - - - - Apache Hadoop MapReduce         MapReduce 2 -  - - - - - - Proprietary Hadoop Common     - -   Apache Hive         Apache Pig         Apache HBase         Apache Zookeeper     - -   Apache Ambari - -  - - - - - Apache Templeton - -  - - - - - Apache Flume   -      Apache Sqoop       - - Apache Mahout   -    - - Apache Whirr   -    - - Apache Oozie         Apache Lucene - - - - - -   Apache Derby - - - - - -   Apache Avro - - - - - -   Hue   - - - - - - BigInsights Apps - - - - - - -  Hadoop Management Nagios - -  - - - - - Ganglia - -  - - - - - Zettaset Orchestrator™         Cloudera Manager   - - - - - - MapR Manager - - - -   - - BigInsights web console - - - - - - -  BigInsights simple console - - - - - -  - 7January 24, 2013 Zettaset, Inc. | Proprietary
    8. 8. What is the current state of Security?• Another Great Question• Minimal work has been done in this field• Currently Not a Huge Community Focus.• Everyone feels like it’s been addressed by adding Kerberos to the systemDon’t tell InfoSec People the Kerberos has fixed everything!
    9. 9. Why Not Tell Them That?• You will give them an aneurysm.• Kerberos is “Brushed On” Security NOT “Baked In” security.• Kerberos does NOT address compliance issues around data (HIPAA, GLBA, PCI, Etc.) Nothing around encryption, nothing around best practices.
    10. 10. Hadoop: What’s Missing?• All Hadoop distros are constrained by the limitations of the Apache open source components• Not written to support hardened security, compliance, encryption, po licy-enablement, and risk management• Not written with high availability, service management, and monitoring in mind 10 © 2012 Zettaset, Inc. | Proprietary and Confidential
    11. 11. Current State of Hadoop Security• Existing security for Apache-based Hadoop distributions does not meet enterprise requirements to support regulatory compliance mandates such as HIPAA and SOX, for example• Security breaches can result in negative impact, e.g., release sensitive information, damage brand, compromise competitive advantage, spark litigation, etc.• Hadoop security mechanism provides mutual authentication of users and services via SASL and Kerberos, but this has limitations 11 © 2012 Zettaset, Inc. | Proprietary and Confidential
    12. 12. Enterprise-Class Hadoop Security Addresses the security gaps and vulnerabilities that exist in all Apache-based Hadoop distributions• Hardened to address access control, policy, compliance and risk management• Support for Lightweight Directory Access Protocol (LDAP) and Active Directory (AD), enabling Hadoop clusters to seamlessly integrate with existing security policies within the enterprise environment• Centralized configuration management, logging, and auditing, which maintains control of ingress and egress points in the cluster, and enables Hadoop clusters to meet compliance requirements for reporting and forensics• Role-based access control (RBAC), which significantly improves the user authentication process, and enables Kerberos to be run against all components of a big data ecosystem, not just Hadoop 12 © 2012 Zettaset, Inc. | Proprietary and Confidential
    13. 13. Defining Big Data Use Case• What is your use case?• What are you trying to accomplish?• What data are you going to be storing?• What are you going to do after you store it? This will define your Security Threat Model and how you protect your data.
    14. 14. Big Data Production System Log files Alerts Transactions etc. Structured Semi-structured Unstructured Types Data Data Data DataJanuary 24, 2013 Zettaset, Inc. | Proprietary and 14
    15. 15. Big Data Landscape (Version 2.0) Infrastructure Analytics Application NoSQL Databases Hadoop Related Analytics Solutions Data Visualization s Ad OptimizationNewSQL Databases Publisher Marketing Statistical Tools Computing Social Media MPP Management Cluster Industry ApplicationsDatabases / Monitoring Services Sentiment Analysis Analytics Services Security Application Service Providers Location / People / Big Data Search Events Crowdsourcin IT Analytics Data Storage g Data Sources Sources Data Collection / Real-Time Crowdsourc SMB Marketplace Transport ed Analytics Analytics s Cross Infrastructure / Personal Data Analytics Open Source ProjectsFramework Query / Data Access Coordination / Real - Statistical Machine Cloud Data Flow Workflow Time Tools Learning Deployme nt © Matt Turck (@mattturck) and Shivon Zilis (@shivonz) Bloomberg Ventures
    16. 16. What is a Threat Model?• Threat modeling is based on the notion that any system or organization has assets of value worth protecting and these assets have certain vulnerabilities.• Internal or external threats exploit these vulnerabilities in order to cause damage to the assets, and appropriate security countermeasures exist that mitigate the threats.• A threat model can help to assess the probability, the potential harm, the priority etc., of attacks, and thus help to minimize or eradicate the threats.
    17. 17. Approaches to threat modeling• 3 general approaches to threat modeling:• Attacker-centric – Attacker-centric threat modeling starts with an attacker, and evaluates their goals, and how they might achieve them. Attackers motivations are often considered, for example, "The NSA wants to read this email," or "Jon wants to copy this DVD and share it with his friends." This approach usually starts from either entry points or assets.• Software-centric – Software-centric threat modeling (also called system-centric, design-centric, or architecture-centric) starts from the design of the system, and attempts to step through a model of the system, looking for types of attacks against each element of the model. This approach is used in threat modeling in Microsofts Security Development Lifecycle.• Asset-centric – Asset-centric threat modeling involves starting from assets entrusted to a system, such as a collection of sensitive personal information.
    18. 18. Bottom Line• Identify any threats to the confidentiality, availability and integrity of the data and the application based on the data access control matrix that your application should be enforcing• Assign risk values and determine the risk responses• Determine the countermeasures to implement based on your chosen risk responses• Continually update the threat model based on the emerging security landscape.
    19. 19. Summary • All existing Apache-based Hadoop distributions have functional limitations which constrain enterprise adoption • Zettaset Orchestrator is addressing the enterprise-level gaps in security, high availability, performance, and manageability that exist in all Apache-based Hadoop distributions • Orchestrator is a universal management and control software layer that can sit on top of any Hadoop distribution (distro- agnostic) • Orchestrator fills the Service Management gaps that exist in all Hadoop distributions and cluster deployments, and makes Hadoop ready for broader enterprise adoption 19 © 2012 Zettaset, Inc. | Proprietary and Confidential
    20. 20. Thank You !

    ×