Big Data Security
Top 5 Security Risks and Best Practices
Jitendra Chauhan
Head R&D, iViZ Security
jitendra.chauhan@gmail.com
Agenda
• Key Insights of Big Data Architecture
• Top 5 Big Data Security Risks
• Top 5 Best Practices
Key Insights of Big Data
Architecture
Distributed Architecture
(Hadoop as example)
Data Partition, Replication
and Distribution
Auto-tiering
Move the
Code
Real Time, Streaming and Continuous
ComputationIntegration Patterns
Real
time
Variety of
Input
Sources
Adhoc
Queries
Parallel & Powerful Programming
Framework
Example:
• 16TB Data
• 128 MB Chunks
• 82000 Maps
Java vs SQL / PLSQL
Frameworks:
• MapReduce
• Storm Topology
(Spouts & Bolts)
Big Data Architecture
No Single Silver Bullet
• Hadoop is already unsuitable for many Big
data problems
• Real-time analytics
• Cloudscale, Storm
• Graph computation
o Giraph and Pregel (Some examples graph
computation are Shortest Paths, Degree of
Separation etc.)
• Low latency queries
o Dremel
Top 5 Security Risks
Insecure Computation
Sensitive
Info
• Information Leak
• Data Corruption
• DoS
Health Data
Untrusted
Computation program
Input Validation and Filtering
• Input Validation
o What kind of data is untrusted?
o What are the untrusted data sources?
• Data Filtering
o Filter Rogue or malicious data
• Challenges
o GBs or TBs continuous data
o Signature based data filtering has limitations
 How to filter Behavior aspect of data?
Granular Access Controls
• Designed for Performance, almost no
security in mind
• Security in Big Data still ongoing research
• Table, Row or Cell level access control gone
missing
• Adhoc Queries poses additional challenges
• Access Control is disabled by default
Insecure Data Storage
• Data at various nodes, Authentication,
Authorization & Encryption is challenging
• Autotiering moves cold data to lesser secure
medium
o What if cold data is sensitive?
• Encryption of Real time data can have
performance impacts
• Secure communication among nodes,
middleware and end users are disabled by
default
Privacy Concerns in Data Mining
and Analytics
• Monetization of Big Data generally involves
Data Mining and Analytics
• Sharing of Results involve multiple
challenges
o Invasion of Privacy
o Invasive Marketing
o Unintentional Disclosure of Information
• Examples
o AOL release of Anonymzed search logs, Users can
easily be identified
o Netflix faced a similar problem
Top 5 Best Practices
• Secure your Computation Code
• Implement access control, code signing, dynamic
analysis of computational code
• Strategy to prevent data in case of untrusted code
• Implement Comprehensive Input Validation
and Filtering
• Implement validation and filtering of input data, from
internal or external sources
• Evaluate input validation filtering of your Big Data
solution
Top 5 Best Practices
• Implement Granular Access Control
• Review Role and Privilege Matrix
• Review permission to execute Adhoc queries
• Enable Access Control
• Secure your Data Storage and Computation
• Sensitive Data should be segregated
• Enable Data encryption for sensitive data
• Audit Administrative Access on Data Nodes
• API Security
Top 5 Best Practices
• Review and Implement Privacy Preserving
Data Mining and Analytics
• Analytics data should not disclose sensitive
information
• Get the Big Data Audited
Thank You
jitendra.chauhan@ivizsecurity.com
http://www.ivizsecurity.com/blog/
Big Data Architecture
Key Insights
• Distributed Architecture & Auto Tiering
• Real Time, Streaming and Continuous
Computation
• Adhoc Queries
• Parallel and Powerful Computation
Language
• Move the Code, Not the data
• Non Relational Data
• Variety of Input Sources
Top 5 Security Risks
• Insecure Computation
• End Point Input Validation and
Filtering
• Granular Access Control
• Insecure Data Storage and
Communication
• Privacy Preserving Data Mining and
Analytics

Big data security challenges and recommendations!

  • 1.
    Big Data Security Top5 Security Risks and Best Practices Jitendra Chauhan Head R&D, iViZ Security jitendra.chauhan@gmail.com
  • 2.
    Agenda • Key Insightsof Big Data Architecture • Top 5 Big Data Security Risks • Top 5 Best Practices
  • 3.
    Key Insights ofBig Data Architecture
  • 4.
    Distributed Architecture (Hadoop asexample) Data Partition, Replication and Distribution Auto-tiering Move the Code
  • 5.
    Real Time, Streamingand Continuous ComputationIntegration Patterns Real time Variety of Input Sources Adhoc Queries
  • 6.
    Parallel & PowerfulProgramming Framework Example: • 16TB Data • 128 MB Chunks • 82000 Maps Java vs SQL / PLSQL Frameworks: • MapReduce • Storm Topology (Spouts & Bolts)
  • 7.
    Big Data Architecture NoSingle Silver Bullet • Hadoop is already unsuitable for many Big data problems • Real-time analytics • Cloudscale, Storm • Graph computation o Giraph and Pregel (Some examples graph computation are Shortest Paths, Degree of Separation etc.) • Low latency queries o Dremel
  • 8.
  • 9.
    Insecure Computation Sensitive Info • InformationLeak • Data Corruption • DoS Health Data Untrusted Computation program
  • 10.
    Input Validation andFiltering • Input Validation o What kind of data is untrusted? o What are the untrusted data sources? • Data Filtering o Filter Rogue or malicious data • Challenges o GBs or TBs continuous data o Signature based data filtering has limitations  How to filter Behavior aspect of data?
  • 11.
    Granular Access Controls •Designed for Performance, almost no security in mind • Security in Big Data still ongoing research • Table, Row or Cell level access control gone missing • Adhoc Queries poses additional challenges • Access Control is disabled by default
  • 12.
    Insecure Data Storage •Data at various nodes, Authentication, Authorization & Encryption is challenging • Autotiering moves cold data to lesser secure medium o What if cold data is sensitive? • Encryption of Real time data can have performance impacts • Secure communication among nodes, middleware and end users are disabled by default
  • 13.
    Privacy Concerns inData Mining and Analytics • Monetization of Big Data generally involves Data Mining and Analytics • Sharing of Results involve multiple challenges o Invasion of Privacy o Invasive Marketing o Unintentional Disclosure of Information • Examples o AOL release of Anonymzed search logs, Users can easily be identified o Netflix faced a similar problem
  • 14.
    Top 5 BestPractices • Secure your Computation Code • Implement access control, code signing, dynamic analysis of computational code • Strategy to prevent data in case of untrusted code • Implement Comprehensive Input Validation and Filtering • Implement validation and filtering of input data, from internal or external sources • Evaluate input validation filtering of your Big Data solution
  • 15.
    Top 5 BestPractices • Implement Granular Access Control • Review Role and Privilege Matrix • Review permission to execute Adhoc queries • Enable Access Control • Secure your Data Storage and Computation • Sensitive Data should be segregated • Enable Data encryption for sensitive data • Audit Administrative Access on Data Nodes • API Security
  • 16.
    Top 5 BestPractices • Review and Implement Privacy Preserving Data Mining and Analytics • Analytics data should not disclose sensitive information • Get the Big Data Audited
  • 17.
  • 18.
    Big Data Architecture KeyInsights • Distributed Architecture & Auto Tiering • Real Time, Streaming and Continuous Computation • Adhoc Queries • Parallel and Powerful Computation Language • Move the Code, Not the data • Non Relational Data • Variety of Input Sources
  • 19.
    Top 5 SecurityRisks • Insecure Computation • End Point Input Validation and Filtering • Granular Access Control • Insecure Data Storage and Communication • Privacy Preserving Data Mining and Analytics

Editor's Notes

  • #5 Partitioned, Distributed and Replicated among multiple Data Nodes 1000,s of Data nodes Autotiering: Moving hottest data to high performance drive, coldest data to low performance, less secure drive