Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Issues Securing Big Data
Mike Pluta, Sr Technical Architect | April 23, 2015
The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confid...
• DST has established internal rules around the use of
Big Data
• Data flowing into our data lake is partitioned by,
what ...
• By default, one Business Unit is not granted access
to another’s data
• Agreements between business units are made to
ac...
• These rules result in a very complex matrix of permissions
• Example below
• Data Doman ‘Business Unit A’ may be accesse...
• Let’s deal with just text data on a file system in a Linux server
• Logical approach is to arrange directories to track ...
sudo useradd buaadm
sudo passwd -d buaadm
sudo useradd bubadm
sudo passwd -d bubadm
sudo useradd bucadm
sudo passwd -d buc...
umask 007
cd $HOME
mkdir data
cd data
mkdir bua
mkdir bub
mkdir buc
mkdir tpd
cd $HOME/data/bua
touch bua_file_1
touch bua...
What It Looks Like
9
• The directory for the Data Domain ‘Business Unit A’ can be accessed by
members of the ‘bua’ group
• How can we grant add...
• POSIX Access Control Lists (ACLs) are the answer to our dilemma
• Not enabled by default. Needs to be enabled at the fil...
• setfacl is used to set the ACL for a file or directory
• getfacl is used to query and list the ACL of a file or director...
• In addition to rwx permissions for the group ‘buag’, add rwx permissions
for the group ‘budg’ to the directory and conte...
Results
14
• Hadoop HDFS v2.6 adds POSIX ACLs
• Make sure to turn it on first
hdfs-site.xml
<property>
<name>dfs.namenode.acls.enable...
• Use a Default ACL for Automatic Application to New Children
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo...
Results With Default ACLs
17
• Don’t forget about the sticky bit
• Makes it so that only root or the directory owner can delete files
sudo chmod +t bua...
19
Upcoming SlideShare
Loading in …5
×

Issues Securing (Big) Data

573 views

Published on

Talk given at Kansas City Data Engineering at Scale Meetup, April 23, 2015.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Issues Securing (Big) Data

  1. 1. Issues Securing Big Data Mike Pluta, Sr Technical Architect | April 23, 2015
  2. 2. The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside or outside DST Systems, Inc. without prior written approval. This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement. By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual property right. If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable, limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative works of this presentation or its content without our prior written permission. Any reference in this presentation to another entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral, or recommendation. Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners. © 2015 DST Systems, Inc. All rights reserved. DisclaimerDisclaimer
  3. 3. • DST has established internal rules around the use of Big Data • Data flowing into our data lake is partitioned by, what we call, Data Domains • Each DST business unit is in essence at least one Data Domain • Data Domains serve as the primary method of organizing our permission-ing Big (or not) Data Security
  4. 4. • By default, one Business Unit is not granted access to another’s data • Agreements between business units are made to access data for purpose • Internal Data Scientists are given cross-Business Unit access to data • Management mandate to secure data which has not been explicitly granted access What This Means 4
  5. 5. • These rules result in a very complex matrix of permissions • Example below • Data Doman ‘Business Unit A’ may be accessed by Business Unit A and Business Unit D. Business Units B and C may not access this Data Domain Complexity 5 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  6. 6. • Let’s deal with just text data on a file system in a Linux server • Logical approach is to arrange directories to track with the Data Domains • For permission-ing, create a group and directory for each Data Domain • Assign the group ownership as appropriate • Set umask to 007 – new files to have u:rw-, g:rw-, o:--- permissions Scenario 6
  7. 7. sudo useradd buaadm sudo passwd -d buaadm sudo useradd bubadm sudo passwd -d bubadm sudo useradd bucadm sudo passwd -d bucadm sudo useradd budadm sudo passwd -d budadm sudo useradd tpdadm sudo passwd -d tpdadm Details – Setup Users and Groups 7 sudo groupadd buag sudo usermod -G buag buaadm sudo groupadd bubg sudo usermod -G bubg bubadm sudo groupadd bucg sudo usermod -G bucg bucadm sudo groupadd budg sudo usermod -G budg budadm sudo groupadd tpdg sudo usermod -G tpdg tpdadm sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
  8. 8. umask 007 cd $HOME mkdir data cd data mkdir bua mkdir bub mkdir buc mkdir tpd cd $HOME/data/bua touch bua_file_1 touch bua_file_2 touch bua_file_3 touch bua_file_4 touch bua_file_5 sudo chown buaadm:buag * Details – Setup Files 8 cd $HOME/data/bub touch bub_file_1 touch bub_file_2 touch bub_file_3 touch bub_file_4 touch bub_file_5 sudo chown bubadm:bubg * cd $HOME/data/buc touch buc_file_1 touch buc_file_2 touch buc_file_3 touch buc_file_4 touch buc_file_5 sudo chown bucadm:bucg * cd $HOME/data/tpd touch tpd_file_1 touch tpd_file_2 touch tpd_file_3 touch tpd_file_4 touch tpd_file_5 sudo chown tpdadm:tpdg * cd $HOME/data sudo chown buaadm:buag bua sudo chown bubadm:bubg bub sudo chown bucadm:bucg buc sudo chown tpdadm:tpdg tpd
  9. 9. What It Looks Like 9
  10. 10. • The directory for the Data Domain ‘Business Unit A’ can be accessed by members of the ‘bua’ group • How can we grant additional access to the ‘bud’ group, but still restrict other groups? Complexity Redux 10 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  11. 11. • POSIX Access Control Lists (ACLs) are the answer to our dilemma • Not enabled by default. Needs to be enabled at the filesystem level • mount with the remount and acl options can enable • mount –o remount –o acl /dev/sda5 /home • See your system administrator for the permanent enable The Secret Sauce 11
  12. 12. • setfacl is used to set the ACL for a file or directory • getfacl is used to query and list the ACL of a file or directory • Our specific need: • In addition to rwx permissions for the group ‘buag’, add rwx permissions for the group ‘budg’ to the directory ‘bua’ • In addition to rwx permissions for the group ‘bubg’, add rwx permissions for the group ‘budg’ to the directory ‘bub’ • In addition to rwx permissions for the group ‘bucg’, add rwx permissions for the groups ‘bubg’ and ‘budg’ to the directory ‘buc’ • In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the groups ‘bucg’ and ‘budg’ to the directory ‘tpd’ The Tools 12
  13. 13. • In addition to rwx permissions for the group ‘buag’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bua’ • setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bua • In addition to rwx permissions for the group ‘bubg’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bub’ • setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bub • In addition to rwx permissions for the group ‘bucg’, add rwx permissions for the groups ‘bubg’ and ‘budg’ to the directory and contents of ‘buc’ • setfacl –R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc • In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the groups ‘bucg’ and ‘budg’ to the directory and contents of ‘tpd’ • setfacl –R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd The Commands 13
  14. 14. Results 14
  15. 15. • Hadoop HDFS v2.6 adds POSIX ACLs • Make sure to turn it on first hdfs-site.xml <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> • Reboot the namenode • Set an ACL hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua • See the ACLs hdfs dfs –getfacl /bua How To Hadoop It 15
  16. 16. • Use a Default ACL for Automatic Application to New Children sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd • And in Hadoop… hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd Other Goodies 16
  17. 17. Results With Default ACLs 17
  18. 18. • Don’t forget about the sticky bit • Makes it so that only root or the directory owner can delete files sudo chmod +t bua • Use the setgid bit to set new files in a directory to have the same group owner as the directory. • Very handy when paired with default ACLS sudo chmod g+s bua Last Extra Bits 18
  19. 19. 19

×