SlideShare a Scribd company logo
Issues Securing Big Data
Mike Pluta, Sr Technical Architect | April 23, 2015
The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality
of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside
or outside DST Systems, Inc. without prior written approval.
This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement.
By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual
property right.
If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable,
limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative
works of this presentation or its content without our prior written permission. Any reference in this presentation to another
entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of
an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral,
or recommendation.
Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners.
© 2015 DST Systems, Inc. All rights reserved.
DisclaimerDisclaimer
• DST has established internal rules around the use of
Big Data
• Data flowing into our data lake is partitioned by,
what we call, Data Domains
• Each DST business unit is in essence at least one
Data Domain
• Data Domains serve as the primary method of
organizing our permission-ing
Big (or not) Data Security
• By default, one Business Unit is not granted access
to another’s data
• Agreements between business units are made to
access data for purpose
• Internal Data Scientists are given cross-Business Unit
access to data
• Management mandate to secure data which has not
been explicitly granted access
What This Means
4
• These rules result in a very complex matrix of permissions
• Example below
• Data Doman ‘Business Unit A’ may be accessed by Business Unit A and Business
Unit D. Business Units B and C may not access this Data Domain
Complexity
5
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
• Let’s deal with just text data on a file system in a Linux server
• Logical approach is to arrange directories to track with the Data Domains
• For permission-ing, create a group and directory for each Data Domain
• Assign the group ownership as appropriate
• Set umask to 007 – new files to have u:rw-, g:rw-, o:--- permissions
Scenario
6
sudo useradd buaadm
sudo passwd -d buaadm
sudo useradd bubadm
sudo passwd -d bubadm
sudo useradd bucadm
sudo passwd -d bucadm
sudo useradd budadm
sudo passwd -d budadm
sudo useradd tpdadm
sudo passwd -d tpdadm
Details – Setup Users and Groups
7
sudo groupadd buag
sudo usermod -G buag buaadm
sudo groupadd bubg
sudo usermod -G bubg bubadm
sudo groupadd bucg
sudo usermod -G bucg bucadm
sudo groupadd budg
sudo usermod -G budg budadm
sudo groupadd tpdg
sudo usermod -G tpdg tpdadm
sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
umask 007
cd $HOME
mkdir data
cd data
mkdir bua
mkdir bub
mkdir buc
mkdir tpd
cd $HOME/data/bua
touch bua_file_1
touch bua_file_2
touch bua_file_3
touch bua_file_4
touch bua_file_5
sudo chown buaadm:buag *
Details – Setup Files
8
cd $HOME/data/bub
touch bub_file_1
touch bub_file_2
touch bub_file_3
touch bub_file_4
touch bub_file_5
sudo chown bubadm:bubg *
cd $HOME/data/buc
touch buc_file_1
touch buc_file_2
touch buc_file_3
touch buc_file_4
touch buc_file_5
sudo chown bucadm:bucg *
cd $HOME/data/tpd
touch tpd_file_1
touch tpd_file_2
touch tpd_file_3
touch tpd_file_4
touch tpd_file_5
sudo chown tpdadm:tpdg *
cd $HOME/data
sudo chown buaadm:buag bua
sudo chown bubadm:bubg bub
sudo chown bucadm:bucg buc
sudo chown tpdadm:tpdg tpd
What It Looks Like
9
• The directory for the Data Domain ‘Business Unit A’ can be accessed by
members of the ‘bua’ group
• How can we grant additional access to the ‘bud’ group, but still restrict
other groups?
Complexity Redux
10
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
• POSIX Access Control Lists (ACLs) are the answer to our dilemma
• Not enabled by default. Needs to be enabled at the filesystem level
• mount with the remount and acl options can enable
• mount –o remount –o acl /dev/sda5 /home
• See your system administrator for the permanent enable
The Secret Sauce
11
• setfacl is used to set the ACL for a file or directory
• getfacl is used to query and list the ACL of a file or directory
• Our specific need:
• In addition to rwx permissions for the group ‘buag’, add rwx permissions for
the group ‘budg’ to the directory ‘bua’
• In addition to rwx permissions for the group ‘bubg’, add rwx permissions for
the group ‘budg’ to the directory ‘bub’
• In addition to rwx permissions for the group ‘bucg’, add rwx permissions for
the groups ‘bubg’ and ‘budg’ to the directory ‘buc’
• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the
groups ‘bucg’ and ‘budg’ to the directory ‘tpd’
The Tools
12
• In addition to rwx permissions for the group ‘buag’, add rwx permissions
for the group ‘budg’ to the directory and contents of ‘bua’
• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bua
• In addition to rwx permissions for the group ‘bubg’, add rwx permissions
for the group ‘budg’ to the directory and contents of ‘bub’
• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bub
• In addition to rwx permissions for the group ‘bucg’, add rwx permissions
for the groups ‘bubg’ and ‘budg’ to the directory and contents of ‘buc’
• setfacl –R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions
for the groups ‘bucg’ and ‘budg’ to the directory and contents of ‘tpd’
• setfacl –R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
The Commands
13
Results
14
• Hadoop HDFS v2.6 adds POSIX ACLs
• Make sure to turn it on first
hdfs-site.xml
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
• Reboot the namenode
• Set an ACL
hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua
• See the ACLs
hdfs dfs –getfacl /bua
How To Hadoop It
15
• Use a Default ACL for Automatic Application to New Children
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
• And in Hadoop…
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd
Other Goodies
16
Results With Default ACLs
17
• Don’t forget about the sticky bit
• Makes it so that only root or the directory owner can delete files
sudo chmod +t bua
• Use the setgid bit to set new files in a directory to have the same group
owner as the directory.
• Very handy when paired with default ACLS
sudo chmod g+s bua
Last Extra Bits
18
19

More Related Content

Similar to Issues Securing (Big) Data

Active directory - an introduction
Active directory - an introductionActive directory - an introduction
Active directory - an introductionpepoluan
 
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
Puppet
 
BSides Hawaii 2020: Dude, Wheres My Domain Admins
BSides Hawaii 2020: Dude, Wheres My Domain AdminsBSides Hawaii 2020: Dude, Wheres My Domain Admins
BSides Hawaii 2020: Dude, Wheres My Domain Admins
Joel M. Leo
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
NETWAYS
 
Topic 3-1_More_Linux_Commands.pptx
Topic 3-1_More_Linux_Commands.pptxTopic 3-1_More_Linux_Commands.pptx
Topic 3-1_More_Linux_Commands.pptx
dulala3
 
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
Isabella789
 
Inithub.org presentation
Inithub.org presentationInithub.org presentation
Inithub.org presentation
Aaron Welch
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
Pantheon
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
iwrigley
 
System administration
System administrationSystem administration
System administration
puspa joshi
 
Puppet Design Patterns - PuppetConf
Puppet Design Patterns - PuppetConfPuppet Design Patterns - PuppetConf
Puppet Design Patterns - PuppetConf
David Danzilio
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Adam Doyle
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentryBrock Noland
 
The 5 Minute DBA-DBA Skills for Non-DBA
The 5 Minute DBA-DBA Skills for Non-DBAThe 5 Minute DBA-DBA Skills for Non-DBA
The 5 Minute DBA-DBA Skills for Non-DBA
percona2013
 
Protecting confidential files using SE-Linux
Protecting confidential files using SE-LinuxProtecting confidential files using SE-Linux
Protecting confidential files using SE-LinuxGiuseppe Paterno'
 
Seven steps to better security
Seven steps to better securitySeven steps to better security
Seven steps to better security
Michael Pignataro
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
 
Linux 101
Linux 101Linux 101
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
SpringPeople
 

Similar to Issues Securing (Big) Data (20)

Linux
Linux Linux
Linux
 
Active directory - an introduction
Active directory - an introductionActive directory - an introduction
Active directory - an introduction
 
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
PuppetConf 2016: A Tale of Two Hierarchies: Group Policy & Puppet – Matt Ston...
 
BSides Hawaii 2020: Dude, Wheres My Domain Admins
BSides Hawaii 2020: Dude, Wheres My Domain AdminsBSides Hawaii 2020: Dude, Wheres My Domain Admins
BSides Hawaii 2020: Dude, Wheres My Domain Admins
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
 
Topic 3-1_More_Linux_Commands.pptx
Topic 3-1_More_Linux_Commands.pptxTopic 3-1_More_Linux_Commands.pptx
Topic 3-1_More_Linux_Commands.pptx
 
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
RH-302 Exam-Red Hat Certified Engineer on Redhat Enterprise Linux 4 (Labs)
 
Inithub.org presentation
Inithub.org presentationInithub.org presentation
Inithub.org presentation
 
Lean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and DrushLean Drupal Repositories with Composer and Drush
Lean Drupal Repositories with Composer and Drush
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
 
System administration
System administrationSystem administration
System administration
 
Puppet Design Patterns - PuppetConf
Puppet Design Patterns - PuppetConfPuppet Design Patterns - PuppetConf
Puppet Design Patterns - PuppetConf
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
The 5 Minute DBA-DBA Skills for Non-DBA
The 5 Minute DBA-DBA Skills for Non-DBAThe 5 Minute DBA-DBA Skills for Non-DBA
The 5 Minute DBA-DBA Skills for Non-DBA
 
Protecting confidential files using SE-Linux
Protecting confidential files using SE-LinuxProtecting confidential files using SE-Linux
Protecting confidential files using SE-Linux
 
Seven steps to better security
Seven steps to better securitySeven steps to better security
Seven steps to better security
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Linux 101
Linux 101Linux 101
Linux 101
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Issues Securing (Big) Data

  • 1. Issues Securing Big Data Mike Pluta, Sr Technical Architect | April 23, 2015
  • 2. The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside or outside DST Systems, Inc. without prior written approval. This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement. By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual property right. If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable, limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative works of this presentation or its content without our prior written permission. Any reference in this presentation to another entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral, or recommendation. Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners. © 2015 DST Systems, Inc. All rights reserved. DisclaimerDisclaimer
  • 3. • DST has established internal rules around the use of Big Data • Data flowing into our data lake is partitioned by, what we call, Data Domains • Each DST business unit is in essence at least one Data Domain • Data Domains serve as the primary method of organizing our permission-ing Big (or not) Data Security
  • 4. • By default, one Business Unit is not granted access to another’s data • Agreements between business units are made to access data for purpose • Internal Data Scientists are given cross-Business Unit access to data • Management mandate to secure data which has not been explicitly granted access What This Means 4
  • 5. • These rules result in a very complex matrix of permissions • Example below • Data Doman ‘Business Unit A’ may be accessed by Business Unit A and Business Unit D. Business Units B and C may not access this Data Domain Complexity 5 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  • 6. • Let’s deal with just text data on a file system in a Linux server • Logical approach is to arrange directories to track with the Data Domains • For permission-ing, create a group and directory for each Data Domain • Assign the group ownership as appropriate • Set umask to 007 – new files to have u:rw-, g:rw-, o:--- permissions Scenario 6
  • 7. sudo useradd buaadm sudo passwd -d buaadm sudo useradd bubadm sudo passwd -d bubadm sudo useradd bucadm sudo passwd -d bucadm sudo useradd budadm sudo passwd -d budadm sudo useradd tpdadm sudo passwd -d tpdadm Details – Setup Users and Groups 7 sudo groupadd buag sudo usermod -G buag buaadm sudo groupadd bubg sudo usermod -G bubg bubadm sudo groupadd bucg sudo usermod -G bucg bucadm sudo groupadd budg sudo usermod -G budg budadm sudo groupadd tpdg sudo usermod -G tpdg tpdadm sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
  • 8. umask 007 cd $HOME mkdir data cd data mkdir bua mkdir bub mkdir buc mkdir tpd cd $HOME/data/bua touch bua_file_1 touch bua_file_2 touch bua_file_3 touch bua_file_4 touch bua_file_5 sudo chown buaadm:buag * Details – Setup Files 8 cd $HOME/data/bub touch bub_file_1 touch bub_file_2 touch bub_file_3 touch bub_file_4 touch bub_file_5 sudo chown bubadm:bubg * cd $HOME/data/buc touch buc_file_1 touch buc_file_2 touch buc_file_3 touch buc_file_4 touch buc_file_5 sudo chown bucadm:bucg * cd $HOME/data/tpd touch tpd_file_1 touch tpd_file_2 touch tpd_file_3 touch tpd_file_4 touch tpd_file_5 sudo chown tpdadm:tpdg * cd $HOME/data sudo chown buaadm:buag bua sudo chown bubadm:bubg bub sudo chown bucadm:bucg buc sudo chown tpdadm:tpdg tpd
  • 9. What It Looks Like 9
  • 10. • The directory for the Data Domain ‘Business Unit A’ can be accessed by members of the ‘bua’ group • How can we grant additional access to the ‘bud’ group, but still restrict other groups? Complexity Redux 10 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  • 11. • POSIX Access Control Lists (ACLs) are the answer to our dilemma • Not enabled by default. Needs to be enabled at the filesystem level • mount with the remount and acl options can enable • mount –o remount –o acl /dev/sda5 /home • See your system administrator for the permanent enable The Secret Sauce 11
  • 12. • setfacl is used to set the ACL for a file or directory • getfacl is used to query and list the ACL of a file or directory • Our specific need: • In addition to rwx permissions for the group ‘buag’, add rwx permissions for the group ‘budg’ to the directory ‘bua’ • In addition to rwx permissions for the group ‘bubg’, add rwx permissions for the group ‘budg’ to the directory ‘bub’ • In addition to rwx permissions for the group ‘bucg’, add rwx permissions for the groups ‘bubg’ and ‘budg’ to the directory ‘buc’ • In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the groups ‘bucg’ and ‘budg’ to the directory ‘tpd’ The Tools 12
  • 13. • In addition to rwx permissions for the group ‘buag’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bua’ • setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bua • In addition to rwx permissions for the group ‘bubg’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bub’ • setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bub • In addition to rwx permissions for the group ‘bucg’, add rwx permissions for the groups ‘bubg’ and ‘budg’ to the directory and contents of ‘buc’ • setfacl –R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc • In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the groups ‘bucg’ and ‘budg’ to the directory and contents of ‘tpd’ • setfacl –R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd The Commands 13
  • 15. • Hadoop HDFS v2.6 adds POSIX ACLs • Make sure to turn it on first hdfs-site.xml <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> • Reboot the namenode • Set an ACL hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua • See the ACLs hdfs dfs –getfacl /bua How To Hadoop It 15
  • 16. • Use a Default ACL for Automatic Application to New Children sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd • And in Hadoop… hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd Other Goodies 16
  • 18. • Don’t forget about the sticky bit • Makes it so that only root or the directory owner can delete files sudo chmod +t bua • Use the setgid bit to set new files in a directory to have the same group owner as the directory. • Very handy when paired with default ACLS sudo chmod g+s bua Last Extra Bits 18
  • 19. 19