SlideShare a Scribd company logo

Hadoop security

B
Biju Nair

Securing Hadoop

1 of 25
Download to read offline
Secure Hadoop Application
Ecosystem
Boston Application Security
Conference
Oct 3 2015
Google Trends – Big Data Big Data Job Trends 2
3
Hadoop EcosystemFlumeSqoop
ZooKeeper
HBase
Hive
Pig
MapReduce
Spark
YARN – Resource Manager
HDFS – Distributed File System
Kafka
Storm
4
Why
• Hadoop is a storage/processing infrastructure
– Whether Big Data is hype or not
• Fits well for lot of use cases
• Inherent distributed storage/processing
– Provides scalability at a relatively low cost
• There is lot of backing
– IBM, Microsoft, Amazon, Google, Intel …
• Various distributions and companies
5
Hadoop Distributed File System
FileA
FileB
FileC
H1:blk0, H2:blk1
H3:blk0,H1:blk1
H2:blk0;H3:blk1
HDFS Directory
Master Host (NN)
DISK
Local File System File
FileA0
FileB1
Inode-x
Inode-y
Local FS Directory
Host 1
FileA1
FileC0
Inode-a
Inode-n
Local FS Directory
Host 2
FileB0
FileC1
Inode-r
Inode-c
Local FS Directory
Host 3
In-x
In-y
In-a
In-n
In-r
In-c
DISK
DISK
DISK
Files created
are of size
equal to the
HDFS blksize
6

Recommended

2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 

More Related Content

What's hot

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosSarvesh Meena
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 

What's hot (20)

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 

Viewers also liked

NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Chef patterns
Chef patternsChef patterns
Chef patternsBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificatesPiyush Chordia
 
It was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraIt was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraFabio Mora
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBlue Coat
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online TrainingWebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Trainingecorptraining2
 
GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)ejlp12
 

Viewers also liked (20)

NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Chef patterns
Chef patternsChef patterns
Chef patterns
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificates
 
It was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraIt was just Open Source - TEDx Novara
It was just Open Source - TEDx Novara
 
THE BIG PRINT - Showing the Action
THE BIG PRINT - Showing the ActionTHE BIG PRINT - Showing the Action
THE BIG PRINT - Showing the Action
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online TrainingWebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
 
GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)
 

Similar to Hadoop security

Similar to Hadoop security (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hdfs
HdfsHdfs
Hdfs
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 

More from Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 

More from Biju Nair (6)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 

Recently uploaded

Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...MichaelBenis1
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeJosh Gellers
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31shyamraj55
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr TsapFwdays
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...IES VE
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 

Recently uploaded (20)

Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human Justice
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
Empowering Net-Zero: Digital Insights and Funding Opportunities for Industria...
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 

Hadoop security

  • 1. Secure Hadoop Application Ecosystem Boston Application Security Conference Oct 3 2015
  • 2. Google Trends – Big Data Big Data Job Trends 2
  • 3. 3
  • 4. Hadoop EcosystemFlumeSqoop ZooKeeper HBase Hive Pig MapReduce Spark YARN – Resource Manager HDFS – Distributed File System Kafka Storm 4
  • 5. Why • Hadoop is a storage/processing infrastructure – Whether Big Data is hype or not • Fits well for lot of use cases • Inherent distributed storage/processing – Provides scalability at a relatively low cost • There is lot of backing – IBM, Microsoft, Amazon, Google, Intel … • Various distributions and companies 5
  • 6. Hadoop Distributed File System FileA FileB FileC H1:blk0, H2:blk1 H3:blk0,H1:blk1 H2:blk0;H3:blk1 HDFS Directory Master Host (NN) DISK Local File System File FileA0 FileB1 Inode-x Inode-y Local FS Directory Host 1 FileA1 FileC0 Inode-a Inode-n Local FS Directory Host 2 FileB0 FileC1 Inode-r Inode-c Local FS Directory Host 3 In-x In-y In-a In-n In-r In-c DISK DISK DISK Files created are of size equal to the HDFS blksize 6
  • 7. HDFS - Write Flow Client Namespace MetaData Blockmap (Fsimage Edit files) Name Node Data Node Data Node Data Node 1 2 3 4 5 6 6 77 8 1. Client requests to open a file to write through fs.create() call. This will overwrite existing file. 2. Name node responds with a lease to the file path 3. Client writes to local and when data reaches block size, requests Name Node for write 4. Name Node responds with a new blockid and the destination data nodes for write and replication 5. Client sends the first data node the data and the checksum generated on the data to be written 6. First data node writes the data and checksum and in parallel pipelines the replications to other DN 7. Each data node where the data is replicated responds back with success /failure to the first DN 8. First data node in turn informs to the Name node that the write request for the block is complete which in turn will update its block map Note: There can be only one write at a time on a file 7
  • 8. HDFS - Read Flow Client Namespace MetaData Blockmap (Fsimage Edit files) Name Node Data Node Data Node Data Node 1 2 3 4 5 6 1. Client requests to open a file to read through fs.open() call 2. Name node responds with a lease to the file path 3. Client requests for read the data in the file 4. Name Node responds with block ids in sequence and the corresponding data nodes 5. Client reaches out directly to the DNs for each block of data in the file 6. When DNs sends back data along with check sum, client performs a checksum verification by generating a checksum 7. If the checksum verification fails client reaches out to other DNs where the re is a replication 7 8
  • 9. Authorization • POSIX model for file and directory permissions – Associated with an owner and a group – Permission for owner, group and others – r for read, w for append to files – r for listing files, w for delete/create files in dirs – x to access child directories – Sticky bit on dirs prevents deletions by others 9
  • 10. Kerberos 10 TGS AS KDB KDC 1 Create Principal User 2 - kinit 3 – Receive TGT 4 – Request Service Ticket Service 5 – Receive Service Ticket For service principals Keytabs are used
  • 11. Secure HDFS Cluster - Authentication Master Namenode Slave Datanode Slave Datanode Slave Datanode KDC Keytab Keytab Keytab Keytab 11
  • 12. Secure HDFS - Client Authentication Namenode Slave Datanode Slave Datanode Slave Datanode KDC HDFS Client KRB Token 1 Deleg Token 2 3 Block Tokens Deleg Token Key Key Key Key 4 12
  • 13. Authentication Configuration • Set up Kerberos infrastructure – It may be already available through AD • Define service principals • Create Keytabs for service principals – E.g. HDFS, YARN • Copy keytabs to the master and slave nodes • Update site.xml files • Restart the services 13
  • 14. HDFS Data Encryption HDFS Client Key Mgmt Server Key Trusty Namenode Datenode 1 - EZ 2 – EZ Key 2 - Create EZ EDEK 3 EDEK 4 – R/W 5 14
  • 16. Controlling Resource Usage • Schedulers – Fair – Capacity • Queues defined to use percentage of resource – Hierarchy with in queues • Users and groups attached to groups – Administer – Submit 16
  • 17. YARN Queue 17 Root 100% Sec 70% sadmin, suser Adhoc 30% Aadmin, auser
  • 18. Hadoop Cluster - Secure Perimeter Master Slave Slave Slave IPS/IDS/Firewall IPS/IDS/Firewall Clients DMZ/Separate Network 18
  • 19. HDFS Services & Ports HDFS Service Port Name Node 8020 Name Node UI 50070 Secondary Name Node UI 50090 Data Node 50020 Data Node UI 50075 Journal Node 8480, 8485 HttpFS 14000, 14001 19
  • 20. Principle of Least Priviledge • hdfs-site xml – dfs.permissions.superusergroup – dfs.cluster.administrators • core-site.xml – Hadoop.security.authorization to true • hadoop-policy.xml – security.client.protocol.acl – security.client.datanode.protocol.acl – security.get.user.mappings.protocol.acl 20
  • 21. Application Code Change Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase"); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab("ubuntu/hostname@REALM", ”ubuntu.keytab"); FileSystem fs = FileSystem.get(conf); 21 Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase"); conf.set("hadoop.security.authentication", "Kerberos"); FileSystem fs = FileSystem.get(conf); Unsecure Hadoop Secure Hadoop
  • 22. Key Takeaways • New infrastructure will be part of enterprises – May not be as big as the hype • Adherence to application security principles – Complexity and maturity may be a roadblock • Constant follow-up on latest developments 22
  • 23. References & Acknowledgements • Hadoop Security – https://issues.apache.org/jira/browse/HADOOP-4487 – Hadoop Project – Securing Hadoop Page • HDFS Encryption – https://issues.apache.org/jira/browse/HDFS-6134 – Hadoop Project Transparent Encryption Page – http://www.slideshare.net/Hadoop_Summit/transparent-encryption-in-hdfs • Hadoop service level authorization • YARN – Fair Scheduler – Capacity Scheduler • Hadoop Security Book 23