Hadoop Security @ Comcast
Dushyanth Vaddi
Ray Harrison
About Comcast & Big Data
“Comcast brings together the best in media and technology. We
drive innovation to create the world's best entertainment and
online experiences.”
• 28 million customer relationships across 5 lines of business
• Video
• High Speed Internet
• Telephony
• Home Security & Automation
• Mobile
• $80B in Revenue (2016)
• Some Big Data use cases
• Ginormous numbers of customer devices
• Cable modems + everything behind the modems such as iPhones, tablets, IoT devices
• Set-top boxes
• Clickstream (video and internet)
• Network equipment & backbone
• Security
• Content delivery
About our Environment
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
After your first
security audit…
All images: Creative Commons
The Cost of Security & Data Breaches
• Average total cost of a data breach: $4 million
• Average cost per stolen record: $158.00
• Caused mostly by: Hackers & criminal insiders
Impacts:
• Direct monetary loss
• Loss of existing customers
• Loss of potential customers
• Servicing each stolen record
• Stock price degradation
• Brand & reputation degradation
• Law suits
Big Examples:
• Netflix
• AOL
• Target
• Sony
All images: Creative Commons
The Cost of Security & Data Breaches
Big Data = Big, Very Expensive Problem
Hadoop Security Challenges
• Hadoop security model maturity
– Relatively recent focus
– Third-party vendor add-ons complicate rather than compliment
• Complex
– Kerberos is complex, may not mix well with existing corporate standards
– Moving parts in authentication
• Kerberos RPC authentication for applications and Hadoop Services
• HTTP SPNEGO authentication for web consoles
• The use of delegation tokens, block tokens, and job tokens
– Multiple network encryption components
– Easy to make mistakes
• Staff experience levels needed to implement
• Existing corporate security policies
• Corporate politics
Upcoming Hadoop Changes
• https://issues.apache.org/jira/browse/HADOOP-9331 - Crypto
components for encryption of data at rest
• https://issues.apache.org/jira/browse/HADOOP-9392 and
https://issues.apache.org/jira/browse/HADOOP-9466 - Token-
Based Authentication & Unified Authorization Framework
• Continued evolution of Apache Knox, Ranger and other Hadoop-
related security projects
Comcast: How we did it
Hadoop Security @ Comcast
Agenda
 The Comcast Hadoop security journey
 Challenges in a multi-tenant environment
 Users
 Applications
 Technical
 Security framework design and guidelines
 Working with stakeholders
 Building your DevOps plan
 Your communication plan
 Implementation
 Next steps: Apache Ranger & HDFS Data Encryption
Hadoop security journey
Research & Design
Planning
Collaboration
Initiation
Project initiation
WHAT
Educate the
stakeholders
on Hadoop
security
WHY
Customer
data, security,
Regulatory
Requirements
WHERE
System, Data,
Cluster
Gateway
WHEN WHO HOW
Kerberos,
Ranger, Knox
Define
implementation
timeline
Corporate
security, Linux,
Hadoop
DevOps,
Development
teams
All the
stakeholders agree
to the
implementation
timeline
Focus on the end
goal and user
interaction with
cluster
Identify costs
and timelines
related to
implementing
the security
Conducting
brainstorming
sessions with all
the stakeholders
Collaboration
Security Framework Design
Available security
options within the
organization
Research
which tools &
how to
implement the
security
Plan
Design the end to end
solution with minimal
disruption for the end
user
Design
Build the security
model, test & verify
Build
Finalize the security
model
Implementation
guidelines Hadoop Security
Framework
Use
AES256 bit encryption, Password
management tool.
Prioritize
Use open source tools for server
management.
Enable
SSO for users & applications.
Avoid
Selecting the easiest solution
because it is the easiest
Timeline
Keep track of the
schedule & milestones
Support
Monitoring the ticket
queue, diagnostics of the
issues
Documentation
Document all the changes for
implementing security with
3rd party applications & user
code
Guide
Guide Developers with
the code changes for
Kerberos
Training
Training the Hadoop
Admin team on Kerberos
Follow-up
Follow-up the issues even
if it seems to been solved
so the problem doesn't
come back.
Collaboration
Plan & collaborate with
the development teams
on the deployment
schedules
DevOps Team Plan
User Community plan
Analyze
Document & analyze all
the jobs and applications
that require code
changes
Business
Communicate with the
business the impact of
security & plan the
implementation
Contact
Constant communication
with the business on the
changes coming due to
security
Deliver
Co-ordinate with the
DevOps teams to
implement the code
changes
Business Challenges
 Ownership of testing applications for Kerberos changes.
 Security is seen as slowing the business deliverables.
 Reluctant to understand the technical details of security.
 Communicating the impact and how hard the security
implementation
 Cultural change from no security to totally secured.
Development Team
challenges
 Time devoted for testing.
 Competing priorities & Collaboration with DevOps.
 Quick Fix Mentality.
 Time allocated to learn.
 Lack of Knowledge to Debug.
DevOps Challenges
 Brining off-shore team up to speed.
 Testing 3rd party applications for Windows & Mac.
 Communication not reaching all stakeholders.
 Testing different versions of the 3rd party applications.
Difference with Hadoop &
RDBMS Security
 Evolving
 Complex to implement
 Too many moving parts
 Code Changes
 3rd party Application compatibility
 Matured
 Refined & well understood
Apache Ranger
 Ranger is for implementing data authorization policies.
 Kerberos is the bases before implementing Ranger.
 Work with data architects & business to define the
polices.
 Ranger offers a centralized security framework to
manage fine-grained access control across.
Data Encryption
 Data Encryption Options :
 Volume encryption
 Application level encryption
 HDFS data at rest encryption
 Data at rest encryption.
 Encryption adds overhead to cluster
 Consider what datasets are encrypted.
QUESTIONS
Dushyanth Vaddi
Dushyanth_vaddi@comcast.com
Ray Harrison
Ray_harrison@comcast.com

Implementing Security on a Large Multi-Tenant Cluster the Right Way

  • 1.
    Hadoop Security @Comcast Dushyanth Vaddi Ray Harrison
  • 2.
    About Comcast &Big Data “Comcast brings together the best in media and technology. We drive innovation to create the world's best entertainment and online experiences.” • 28 million customer relationships across 5 lines of business • Video • High Speed Internet • Telephony • Home Security & Automation • Mobile • $80B in Revenue (2016) • Some Big Data use cases • Ginormous numbers of customer devices • Cable modems + everything behind the modems such as iPhones, tablets, IoT devices • Set-top boxes • Clickstream (video and internet) • Network equipment & backbone • Security • Content delivery
  • 3.
  • 4.
    Evolution of Hadoopin the Enterprise All images: Creative Commons
  • 5.
    Evolution of Hadoopin the Enterprise All images: Creative Commons
  • 6.
    Evolution of Hadoopin the Enterprise All images: Creative Commons
  • 7.
    Evolution of Hadoopin the Enterprise After your first security audit… All images: Creative Commons
  • 8.
    The Cost ofSecurity & Data Breaches • Average total cost of a data breach: $4 million • Average cost per stolen record: $158.00 • Caused mostly by: Hackers & criminal insiders Impacts: • Direct monetary loss • Loss of existing customers • Loss of potential customers • Servicing each stolen record • Stock price degradation • Brand & reputation degradation • Law suits Big Examples: • Netflix • AOL • Target • Sony All images: Creative Commons
  • 9.
    The Cost ofSecurity & Data Breaches Big Data = Big, Very Expensive Problem
  • 10.
    Hadoop Security Challenges •Hadoop security model maturity – Relatively recent focus – Third-party vendor add-ons complicate rather than compliment • Complex – Kerberos is complex, may not mix well with existing corporate standards – Moving parts in authentication • Kerberos RPC authentication for applications and Hadoop Services • HTTP SPNEGO authentication for web consoles • The use of delegation tokens, block tokens, and job tokens – Multiple network encryption components – Easy to make mistakes • Staff experience levels needed to implement • Existing corporate security policies • Corporate politics
  • 11.
    Upcoming Hadoop Changes •https://issues.apache.org/jira/browse/HADOOP-9331 - Crypto components for encryption of data at rest • https://issues.apache.org/jira/browse/HADOOP-9392 and https://issues.apache.org/jira/browse/HADOOP-9466 - Token- Based Authentication & Unified Authorization Framework • Continued evolution of Apache Knox, Ranger and other Hadoop- related security projects
  • 12.
    Comcast: How wedid it Hadoop Security @ Comcast
  • 13.
    Agenda  The ComcastHadoop security journey  Challenges in a multi-tenant environment  Users  Applications  Technical  Security framework design and guidelines  Working with stakeholders  Building your DevOps plan  Your communication plan  Implementation  Next steps: Apache Ranger & HDFS Data Encryption
  • 14.
    Hadoop security journey Research& Design Planning Collaboration Initiation
  • 15.
    Project initiation WHAT Educate the stakeholders onHadoop security WHY Customer data, security, Regulatory Requirements WHERE System, Data, Cluster Gateway WHEN WHO HOW Kerberos, Ranger, Knox Define implementation timeline Corporate security, Linux, Hadoop DevOps, Development teams
  • 16.
    All the stakeholders agree tothe implementation timeline Focus on the end goal and user interaction with cluster Identify costs and timelines related to implementing the security Conducting brainstorming sessions with all the stakeholders Collaboration
  • 17.
    Security Framework Design Availablesecurity options within the organization Research which tools & how to implement the security Plan Design the end to end solution with minimal disruption for the end user Design Build the security model, test & verify Build Finalize the security model Implementation
  • 18.
    guidelines Hadoop Security Framework Use AES256bit encryption, Password management tool. Prioritize Use open source tools for server management. Enable SSO for users & applications. Avoid Selecting the easiest solution because it is the easiest
  • 19.
    Timeline Keep track ofthe schedule & milestones Support Monitoring the ticket queue, diagnostics of the issues Documentation Document all the changes for implementing security with 3rd party applications & user code Guide Guide Developers with the code changes for Kerberos Training Training the Hadoop Admin team on Kerberos Follow-up Follow-up the issues even if it seems to been solved so the problem doesn't come back. Collaboration Plan & collaborate with the development teams on the deployment schedules DevOps Team Plan
  • 20.
    User Community plan Analyze Document& analyze all the jobs and applications that require code changes Business Communicate with the business the impact of security & plan the implementation Contact Constant communication with the business on the changes coming due to security Deliver Co-ordinate with the DevOps teams to implement the code changes
  • 21.
    Business Challenges  Ownershipof testing applications for Kerberos changes.  Security is seen as slowing the business deliverables.  Reluctant to understand the technical details of security.  Communicating the impact and how hard the security implementation  Cultural change from no security to totally secured.
  • 22.
    Development Team challenges  Timedevoted for testing.  Competing priorities & Collaboration with DevOps.  Quick Fix Mentality.  Time allocated to learn.  Lack of Knowledge to Debug.
  • 23.
    DevOps Challenges  Briningoff-shore team up to speed.  Testing 3rd party applications for Windows & Mac.  Communication not reaching all stakeholders.  Testing different versions of the 3rd party applications.
  • 24.
    Difference with Hadoop& RDBMS Security  Evolving  Complex to implement  Too many moving parts  Code Changes  3rd party Application compatibility  Matured  Refined & well understood
  • 25.
    Apache Ranger  Rangeris for implementing data authorization policies.  Kerberos is the bases before implementing Ranger.  Work with data architects & business to define the polices.  Ranger offers a centralized security framework to manage fine-grained access control across.
  • 26.
    Data Encryption  DataEncryption Options :  Volume encryption  Application level encryption  HDFS data at rest encryption  Data at rest encryption.  Encryption adds overhead to cluster  Consider what datasets are encrypted.
  • 27.