Security is the greatest challenge for the widespread adoption of Hadoop in enterprises.
This meetup will discuss ways and means of how such challenges are being met with various solutions and/or products in the industry today. Industry security experts will showcase their varied experiences.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
1. Creating a Secure Hadoop
Initiative
Securing the Big Data Ecosystem
This document contains confidential, proprietary and trade secret information and is subject to certain legal protection. You may not
review, copy, or distribute this information unless you are a designated recipient, and have prior written authorization from Zettaset, Inc.
2. About Me
• CTO Zettaset, Inc. – Big Data Hadoop Company
– Founded 2007
• Distributed Computing Guy
– Have been since college
• Security Guy
– Founder SPI Dynamics (sold to HP, 2007)
– Internet Security Systems, Prof. Services
– Security First Network Bank, Sec. Guru.
5. What is Big Data?
• Great Question
• It’s not a number, people define it
differently.
• Majority define it as a scalability
issue:
– “The inability to continue storing and processing data the way
that you’ve been storing and processing data.”
6. Exponential Data Growth = Big Data
Estimated Global Data Volume:
2011: 1.8 Zettabytes
2015: 7.9 Zettabytes
The world's information
doubles every two years
Over the next 10 years:
The number of servers worldwide
will grow by 10x
Amount of information managed
by enterprise data centers will
grow by 50x
Number of “files” enterprise data
center handle will grow by 75x
Source: http://www.emc.com/leadership/programs/digital-
universe.htm, which was based on the 2011 IDC Digital Universe
Study
6
8. What is the current state of Security?
• Another Great Question
• Minimal work has been done in this field
• Currently Not a Huge Community Focus.
• Everyone feels like it’s been addressed by
adding Kerberos to the system
Don’t tell InfoSec People the Kerberos has fixed
everything!
9. Why Not Tell Them That?
• You will give them an aneurysm.
• Kerberos is “Brushed On” Security NOT
“Baked In” security.
• Kerberos does NOT address compliance
issues around data (HIPAA, GLBA, PCI, Etc.)
Nothing around encryption, nothing around
best practices.
13. Defining Big Data Use Case
• What is your use case?
• What are you trying to accomplish?
• What data are you going to be storing?
• What are you going to do after you store it?
This will define your Security Threat Model and how you
protect your data.
14. Big Data Production System
Log files
Alerts
Transactions
etc.
Structured Semi-structured Unstructured
Types
Data
Data Data Data
January 24, 2013 Zettaset, Inc. | Proprietary and 14
16. What is a Threat Model?
• Threat modeling is based on the notion that any
system or organization has assets of value worth
protecting and these assets have certain
vulnerabilities.
• Internal or external threats exploit these
vulnerabilities in order to cause damage to the
assets, and appropriate security
countermeasures exist that mitigate the threats.
• A threat model can help to assess the
probability, the potential harm, the priority etc., of
attacks, and thus help to minimize or eradicate
the threats.
17. Approaches to threat modeling
• 3 general approaches to threat modeling:
• Attacker-centric
– Attacker-centric threat modeling starts with an attacker, and evaluates their
goals, and how they might achieve them. Attacker's motivations are often
considered, for example, "The NSA wants to read this email," or "Jon wants to
copy this DVD and share it with his friends." This approach usually starts from
either entry points or assets.
• Software-centric
– Software-centric threat modeling (also called 'system-centric,' 'design-centric,' or
'architecture-centric') starts from the design of the system, and attempts to step
through a model of the system, looking for types of attacks against each element
of the model. This approach is used in threat modeling in Microsoft's Security
Development Lifecycle.
• Asset-centric
– Asset-centric threat modeling involves starting from assets entrusted to a
system, such as a collection of sensitive personal information.
18. Bottom Line
• Identify any threats to the confidentiality, availability and
integrity of the data and the application based on the
data access control matrix that your application should
be enforcing
• Assign risk values and determine the risk responses
• Determine the countermeasures to implement based on
your chosen risk responses
• Continually update the threat model based on the
emerging security landscape.
Data capacity on average in enterprises is growing at 40% to 60% year over year due to a number of factors, including an explosion in unstructured data.” - Computerworld, 2010“80 percent of data is unstructured.” - IBM, 2010
IBM BigInsights includes proprietary applications as part of its EE distribution only. These include Jaql, Jaqlserver, Workflow, BigSheets, and LanguageWare
Don’t tell them it’s secure just because it’s behind a fire wall either… it didn’t work in 1995 and it dosent always work now.
Before you can talk about securing big data, you have to understand your data use case