Deploying Enterprise-grade Security for HadoopCloudera, Inc.
Deploying enterprise grade security for Hadoop or six security problems with Apache Hive. In this talk we will discuss the security problems with Hive and then secure Hive with Apache Sentry. Additional topics will include Hadoop security, and Role Based Access Control (RBAC).
Carlos García - Pentesting Active Directory Forests [rooted2019]RootedCON
The document discusses penetration testing of Active Directory forests and trusts. It begins with an introduction to forests, domains, and trust types. It then covers authentication protocols like NTLM and Kerberos across trusts. Next, it discusses techniques for enumerating trusts and mapping the trust relationships. The document outlines common attacks when domain admin privileges are available, such as using Golden Tickets and SID history exploitation. For situations without domain admin, it recommends reconnaissance of trusts and objects to map a path to privileged accounts.
Secure Redis Cluster At Box: Vova Galchenko, Ravitej SistlaRedis Labs
This document presents an overview of how Box uses Redis clusters and caches data. It discusses security requirements for data stores at Box, including authentication and encryption. It then introduces Secure Redis Proxy, a solution developed by Box to add authentication and encryption to Redis clusters while minimizing performance and operational impacts. Secure Redis Proxy classifies Redis commands and determines whether to authenticate, encrypt, or pass commands through without modification based on the type of command. It also supports password rotation for credentials.
The document discusses security in Hadoop clusters. It introduces authentication using Kerberos and authorization using access control lists (ACLs). Kerberos provides mutual authentication between services and clients in Hadoop. ACLs control access at the service and file level. The document outlines how to configure Kerberos with Hadoop, including setting principals and keytabs for services. It also discusses integrating Kerberos with an Active Directory domain.
DNSSEC - Domain Name System Security ExtensionsPeter R. Egli
This document introduces DNS Security Extensions (DNSSEC) which aims to secure DNS queries and information by adding digital signatures to DNS response records. It discusses security problems with the current DNS system like cache poisoning and spoofing attacks. DNSSEC uses cryptographic keys and signatures to authenticate DNS responses and establish a chain of trust. While DNSSEC adds security, its deployment has been gradual due to complexity and the need for widespread implementation to provide full benefits.
This document provides an introduction to DNSSEC (Domain Name System Security Extensions) in 3 parts:
1. It explains the purpose of DNSSEC is to address vulnerabilities in the DNS like cache poisoning and lack of data integrity by cryptographically signing DNS records.
2. It discusses some of the operational implications of DNSSEC like increased response sizes requiring EDNS0, using multiple keys (KSK and ZSK), and developing a DNSSEC Policy and Practice Statement.
3. It provides resources for further learning including open source DNSSEC software, mailing lists, and examples of deployed DNSSEC at the root zone and in some top-level domains.
This document provides an overview of DNS security and DNSSEC. It begins with explanations of what DNS is, how it works, and how DNS responses can be corrupted. It then discusses the problems that occur when DNS goes bad, such as being directed to the wrong site or downloading malware. The document introduces DNSSEC as a solution and explains why it was created and why it is important, particularly for government agencies. It addresses why more organizations don't use DNSSEC and the challenges of deploying and maintaining it. Finally, it describes options for implementing DNSSEC, including the GSA DNSSEC Cloud Signing Service, which handles the complexities for .gov domains.
Encrypted DNS - DNS over TLS / DNS over HTTPSAlex Mayrhofer
Encryption is coming to mainstream DNS. This briefing discusses the history, protocols and architecture of encrypted DNS, specifically DNS over TLS and DNS over HTTPS. It also describes the impact of DoT and DoH on various operational models.
This briefing was given during DNSheads Vienna #5 at the nic.at office in Vienna on Jan 30 2018.
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
Deploying enterprise grade security for Hadoop or six security problems with Apache Hive. In this talk we will discuss the security problems with Hive and then secure Hive with Apache Sentry. Additional topics will include Hadoop security, and Role Based Access Control (RBAC).
Carlos García - Pentesting Active Directory Forests [rooted2019]RootedCON
The document discusses penetration testing of Active Directory forests and trusts. It begins with an introduction to forests, domains, and trust types. It then covers authentication protocols like NTLM and Kerberos across trusts. Next, it discusses techniques for enumerating trusts and mapping the trust relationships. The document outlines common attacks when domain admin privileges are available, such as using Golden Tickets and SID history exploitation. For situations without domain admin, it recommends reconnaissance of trusts and objects to map a path to privileged accounts.
Secure Redis Cluster At Box: Vova Galchenko, Ravitej SistlaRedis Labs
This document presents an overview of how Box uses Redis clusters and caches data. It discusses security requirements for data stores at Box, including authentication and encryption. It then introduces Secure Redis Proxy, a solution developed by Box to add authentication and encryption to Redis clusters while minimizing performance and operational impacts. Secure Redis Proxy classifies Redis commands and determines whether to authenticate, encrypt, or pass commands through without modification based on the type of command. It also supports password rotation for credentials.
The document discusses security in Hadoop clusters. It introduces authentication using Kerberos and authorization using access control lists (ACLs). Kerberos provides mutual authentication between services and clients in Hadoop. ACLs control access at the service and file level. The document outlines how to configure Kerberos with Hadoop, including setting principals and keytabs for services. It also discusses integrating Kerberos with an Active Directory domain.
DNSSEC - Domain Name System Security ExtensionsPeter R. Egli
This document introduces DNS Security Extensions (DNSSEC) which aims to secure DNS queries and information by adding digital signatures to DNS response records. It discusses security problems with the current DNS system like cache poisoning and spoofing attacks. DNSSEC uses cryptographic keys and signatures to authenticate DNS responses and establish a chain of trust. While DNSSEC adds security, its deployment has been gradual due to complexity and the need for widespread implementation to provide full benefits.
This document provides an introduction to DNSSEC (Domain Name System Security Extensions) in 3 parts:
1. It explains the purpose of DNSSEC is to address vulnerabilities in the DNS like cache poisoning and lack of data integrity by cryptographically signing DNS records.
2. It discusses some of the operational implications of DNSSEC like increased response sizes requiring EDNS0, using multiple keys (KSK and ZSK), and developing a DNSSEC Policy and Practice Statement.
3. It provides resources for further learning including open source DNSSEC software, mailing lists, and examples of deployed DNSSEC at the root zone and in some top-level domains.
This document provides an overview of DNS security and DNSSEC. It begins with explanations of what DNS is, how it works, and how DNS responses can be corrupted. It then discusses the problems that occur when DNS goes bad, such as being directed to the wrong site or downloading malware. The document introduces DNSSEC as a solution and explains why it was created and why it is important, particularly for government agencies. It addresses why more organizations don't use DNSSEC and the challenges of deploying and maintaining it. Finally, it describes options for implementing DNSSEC, including the GSA DNSSEC Cloud Signing Service, which handles the complexities for .gov domains.
Encrypted DNS - DNS over TLS / DNS over HTTPSAlex Mayrhofer
Encryption is coming to mainstream DNS. This briefing discusses the history, protocols and architecture of encrypted DNS, specifically DNS over TLS and DNS over HTTPS. It also describes the impact of DoT and DoH on various operational models.
This briefing was given during DNSheads Vienna #5 at the nic.at office in Vienna on Jan 30 2018.
Protect your private data with ORC column encryptionOwen O'Malley
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. ORC is a self-describing type-aware columnar file format designed for Hadoop workloads that provides optimized streaming reads but with integrated support for finding required rows quickly.
Owen O’Malley dives into the progress the Apache community made for adding fine-grained column-level encryption natively into ORC format, which also provides capabilities to mask or redact data on write while protecting sensitive column metadata such as statistics to avoid information leakage. The column encryption capabilities will be fully compatible with Hadoop Key Management Server (KMS) and use the KMS to manage master keys, providing the additional flexibility to use and manage keys per column centrally.
This document provides an overview of DNSSEC (Domain Name System Security Extensions). It discusses cryptography concepts used in DNSSEC like public-key cryptography, hashing algorithms, and digital signatures. It explains how DNSSEC uses these concepts to provide data integrity and authentication for DNS responses through the use of new resource records like RRSIG, DNSKEY, DS, and NSEC. It also discusses how DNSSEC establishes trust chains to validate signatures up to the root zone, and how "denial of existence" responses are signed using NSEC records.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
The document discusses DNS over HTTPS (DoH), DNS over TLS (DoT), and Encrypted Server Name Indication (ESNI). It explains that DoH and DoT encrypt DNS queries and responses for increased privacy and security compared to traditional DNS. However, ESNI is also discussed as a solution to further improve privacy by encrypting the server name in the TLS handshake. Concerns about the potential use of these protocols by malware for command and control are also raised.
Signing DNSSEC answers on the fly at the edge: challenges and solutionsAPNIC
Signing DNSSEC answers on the fly at the edge: challenges and solutions, by Jono Bergquist.
A presentation given at the APNIC 40 APOPS 2 session on Tue, 8 Sep 2015.
This document provides an overview of key concepts in DNSSEC including public/private keys, message digests or hashes, and digital signatures. It explains that public/private key pairs are used, where the private key is kept secret and the public key can be freely distributed. It also describes how one-way hashing functions work to generate fixed-length hashes from variable-length data, and how digital signatures are created by encrypting a message hash with a private key. These three concepts of public/private keys, hashes, and digital signatures form the basis of cryptographic techniques used in DNSSEC.
Deploying New DNSSEC Algorithms (IEPG@IETF93 - July 2015)Dan York
In this talk to the IEPG session at IETF 93 in Prague on 19 July 2015, I outlined some of the challenges associated with deploying new crypto algorithms within DNSSEC and what we potentially need to do to address these challenges.
This document provides an introduction to a DNSSEC training course hosted by RIPE NCC. It explains that DNSSEC protects against DNS spoofing and data corruption by using digital signatures to authenticate DNS data and establish its integrity. The course aims to raise awareness of DNSSEC and provide guidance on deployment. It outlines DNSSEC mechanisms like using new resource records and signing zones to authenticate communication between servers and establish authenticity of DNS data.
This presentation gives an overview of the Domain Name System (DNS) and what goes into making the DNS secure. This deck also answers the question what is ICANN's role in Domain Name System Security (DNSSEC) deployment?
This document discusses IPv6 threats to government networks. It provides an overview of IPv6 including its large address space and advantages over IPv4. It notes that while the US government is required to transition to IPv6, progress has been slow. Specific IPv6 threats are examined such as NDP spoofing, SLAAC attacks, and Teredo tunneling. It is concluded that most organizations are not fully prepared to detect and mitigate IPv6 threats due to limitations in tools, analyst expertise, and threat intelligence focusing primarily on IPv4.
The document is a slide deck for a DNSSEC tutorial presented at the USENIX LISA conference in 2013. It provides an overview of DNSSEC, including how it uses public key cryptography and digital signatures to authenticate DNS data and establish a chain of trust. It also covers topics like configuring DNSSEC in BIND, using the dig tool to perform queries, and prospects for new applications of DNSSEC. The presentation was given by Shumon Huque, an IT director at the University of Pennsylvania.
This document provides an overview and introduction to Velociraptor, an open source forensic tool. It summarizes who developed Velociraptor, how it works, and how to install and use it. The document guides users through collecting artifacts from endpoints, hunting across networks, and monitoring endpoints for events using Velociraptor's query language and artifact system. It encourages customizing the tool's abilities and contributing feedback to its ongoing development.
This document discusses the DANE protocol, which combines DNSSEC and TLS to provide both encryption and strong integrity protection for secure communication. It explains that while TLS provides encryption, DNSSEC provides integrity protection by allowing certificates to be stored and signed within DNS. This prevents man-in-the-middle attacks and ensures browsers receive the correct certificates. The document provides resources on DANE and urges developers, DNS providers and network operators to support it to improve security.
This document discusses DNS cache poisoning vulnerabilities, including:
- Explanations of how cache poisoning works by entering non-authoritative records into a resolver's cache.
- A timeline of vulnerabilities discovered from 1993-2008 related to implementation issues that allowed cache poisoning.
- Countermeasures like DNSSEC that add authentication and integrity to DNS to prevent cache poisoning attacks.
Extracting Forensic Information From Zeus DerivativesSource Conference
The document discusses extracting forensic information from Zeus and its derivatives. It outlines goals like determining what data was stolen, where it was sent, and who the attackers were. It then describes how to achieve these goals by extracting information like command and control addresses, stolen data, and configuration files from variants like Zeus 2.0.8.9, IceIX, Citadel, Gameover, and KINS through analyzing their encryption routines, configuration retrieval methods, and automated analysis.
The DNSSEC key signing key (or KSK) of the DNS root zone will be changed in the summer of 2017. During the time between July and October, all DNSSEC validating resolver need to get the new key material.
In this webinar we explain the KSK roll, how DNS resolver will load the new KSK with the RFC 5011 protocol and how a DNS administrator can verify that the new KSK is present in the resolvers configuration.
Install and Understand DNSSEC in Linux Server running BIND 9 with CHROOT JAIL system and Service.
By Utah Networxs
Follow - @fabioandpires
Follow - @utah_networxs
This document provides an introduction to DNSSEC and DANE based security for TLS. It discusses how DANE uses DNSSEC-signed TLSA records to bind TLS certificates to domain names, solving problems with the traditional PKIX trust model. The document outlines how DANE works, how to create TLSA records, and how DANE can secure protocols like HTTPS, SMTP, and XMPP that currently rely on PKIX certificates. It also introduces the Bloodhound browser that includes DANE support to validate TLS connections using DNSSEC and DANE.
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
Thousands of unsecured Hadoop clusters have been targets of attacks where criminals have deleted databases and files. According to reports, over 5,000 Hadoop installations were accessible on port 50070 without authentication, allowing attackers to destroy data nodes and snapshots containing terabytes of data within seconds. A study found nearly 4,500 servers with the Hadoop Distributed File System exposed over 5 petabytes of data. Many of these unsecured systems have likely already been compromised by attackers destroying data.
Protect your private data with ORC column encryptionOwen O'Malley
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. ORC is a self-describing type-aware columnar file format designed for Hadoop workloads that provides optimized streaming reads but with integrated support for finding required rows quickly.
Owen O’Malley dives into the progress the Apache community made for adding fine-grained column-level encryption natively into ORC format, which also provides capabilities to mask or redact data on write while protecting sensitive column metadata such as statistics to avoid information leakage. The column encryption capabilities will be fully compatible with Hadoop Key Management Server (KMS) and use the KMS to manage master keys, providing the additional flexibility to use and manage keys per column centrally.
This document provides an overview of DNSSEC (Domain Name System Security Extensions). It discusses cryptography concepts used in DNSSEC like public-key cryptography, hashing algorithms, and digital signatures. It explains how DNSSEC uses these concepts to provide data integrity and authentication for DNS responses through the use of new resource records like RRSIG, DNSKEY, DS, and NSEC. It also discusses how DNSSEC establishes trust chains to validate signatures up to the root zone, and how "denial of existence" responses are signed using NSEC records.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
The document discusses DNS over HTTPS (DoH), DNS over TLS (DoT), and Encrypted Server Name Indication (ESNI). It explains that DoH and DoT encrypt DNS queries and responses for increased privacy and security compared to traditional DNS. However, ESNI is also discussed as a solution to further improve privacy by encrypting the server name in the TLS handshake. Concerns about the potential use of these protocols by malware for command and control are also raised.
Signing DNSSEC answers on the fly at the edge: challenges and solutionsAPNIC
Signing DNSSEC answers on the fly at the edge: challenges and solutions, by Jono Bergquist.
A presentation given at the APNIC 40 APOPS 2 session on Tue, 8 Sep 2015.
This document provides an overview of key concepts in DNSSEC including public/private keys, message digests or hashes, and digital signatures. It explains that public/private key pairs are used, where the private key is kept secret and the public key can be freely distributed. It also describes how one-way hashing functions work to generate fixed-length hashes from variable-length data, and how digital signatures are created by encrypting a message hash with a private key. These three concepts of public/private keys, hashes, and digital signatures form the basis of cryptographic techniques used in DNSSEC.
Deploying New DNSSEC Algorithms (IEPG@IETF93 - July 2015)Dan York
In this talk to the IEPG session at IETF 93 in Prague on 19 July 2015, I outlined some of the challenges associated with deploying new crypto algorithms within DNSSEC and what we potentially need to do to address these challenges.
This document provides an introduction to a DNSSEC training course hosted by RIPE NCC. It explains that DNSSEC protects against DNS spoofing and data corruption by using digital signatures to authenticate DNS data and establish its integrity. The course aims to raise awareness of DNSSEC and provide guidance on deployment. It outlines DNSSEC mechanisms like using new resource records and signing zones to authenticate communication between servers and establish authenticity of DNS data.
This presentation gives an overview of the Domain Name System (DNS) and what goes into making the DNS secure. This deck also answers the question what is ICANN's role in Domain Name System Security (DNSSEC) deployment?
This document discusses IPv6 threats to government networks. It provides an overview of IPv6 including its large address space and advantages over IPv4. It notes that while the US government is required to transition to IPv6, progress has been slow. Specific IPv6 threats are examined such as NDP spoofing, SLAAC attacks, and Teredo tunneling. It is concluded that most organizations are not fully prepared to detect and mitigate IPv6 threats due to limitations in tools, analyst expertise, and threat intelligence focusing primarily on IPv4.
The document is a slide deck for a DNSSEC tutorial presented at the USENIX LISA conference in 2013. It provides an overview of DNSSEC, including how it uses public key cryptography and digital signatures to authenticate DNS data and establish a chain of trust. It also covers topics like configuring DNSSEC in BIND, using the dig tool to perform queries, and prospects for new applications of DNSSEC. The presentation was given by Shumon Huque, an IT director at the University of Pennsylvania.
This document provides an overview and introduction to Velociraptor, an open source forensic tool. It summarizes who developed Velociraptor, how it works, and how to install and use it. The document guides users through collecting artifacts from endpoints, hunting across networks, and monitoring endpoints for events using Velociraptor's query language and artifact system. It encourages customizing the tool's abilities and contributing feedback to its ongoing development.
This document discusses the DANE protocol, which combines DNSSEC and TLS to provide both encryption and strong integrity protection for secure communication. It explains that while TLS provides encryption, DNSSEC provides integrity protection by allowing certificates to be stored and signed within DNS. This prevents man-in-the-middle attacks and ensures browsers receive the correct certificates. The document provides resources on DANE and urges developers, DNS providers and network operators to support it to improve security.
This document discusses DNS cache poisoning vulnerabilities, including:
- Explanations of how cache poisoning works by entering non-authoritative records into a resolver's cache.
- A timeline of vulnerabilities discovered from 1993-2008 related to implementation issues that allowed cache poisoning.
- Countermeasures like DNSSEC that add authentication and integrity to DNS to prevent cache poisoning attacks.
Extracting Forensic Information From Zeus DerivativesSource Conference
The document discusses extracting forensic information from Zeus and its derivatives. It outlines goals like determining what data was stolen, where it was sent, and who the attackers were. It then describes how to achieve these goals by extracting information like command and control addresses, stolen data, and configuration files from variants like Zeus 2.0.8.9, IceIX, Citadel, Gameover, and KINS through analyzing their encryption routines, configuration retrieval methods, and automated analysis.
The DNSSEC key signing key (or KSK) of the DNS root zone will be changed in the summer of 2017. During the time between July and October, all DNSSEC validating resolver need to get the new key material.
In this webinar we explain the KSK roll, how DNS resolver will load the new KSK with the RFC 5011 protocol and how a DNS administrator can verify that the new KSK is present in the resolvers configuration.
Install and Understand DNSSEC in Linux Server running BIND 9 with CHROOT JAIL system and Service.
By Utah Networxs
Follow - @fabioandpires
Follow - @utah_networxs
This document provides an introduction to DNSSEC and DANE based security for TLS. It discusses how DANE uses DNSSEC-signed TLSA records to bind TLS certificates to domain names, solving problems with the traditional PKIX trust model. The document outlines how DANE works, how to create TLSA records, and how DANE can secure protocols like HTTPS, SMTP, and XMPP that currently rely on PKIX certificates. It also introduces the Bloodhound browser that includes DANE support to validate TLS connections using DNSSEC and DANE.
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
Thousands of unsecured Hadoop clusters have been targets of attacks where criminals have deleted databases and files. According to reports, over 5,000 Hadoop installations were accessible on port 50070 without authentication, allowing attackers to destroy data nodes and snapshots containing terabytes of data within seconds. A study found nearly 4,500 servers with the Hadoop Distributed File System exposed over 5 petabytes of data. Many of these unsecured systems have likely already been compromised by attackers destroying data.
Big problems with big data – Hadoop interfaces securitySecuRing
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers?
Apart from complexity of Hadoop installations and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security and encryption at rest. We tested popular Hadoop environments and found a few critical vulnerabilities, which for sure cast a shadow on big data security.
Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. Cloudera says that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects.
https://www.pass4sureexam.com/ccD-410.html
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAmazon Web Services
Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityJakub Kałużny
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers?
Apart from complexity of Hadoop installations and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security and encryption at rest. We tested popular Hadoop environments and found a few critical vulnerabilities, which for sure cast a shadow on big data security.
The Document provides an overview of
the key security challenges in Big Data (Apache Hadoop)systems, and showcases the solutions used by Hortonworks Distribution to solve these security challenges.
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls as well as the encryption of data at rest and in flight.
The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it is not decrypted again until it is accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it is being specifically accessed by a processing application.
TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does have its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS filesystem accessed by this cluster.
BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. This session at the Strata Data Conference in March 2018 (by Thomas Phelan, co-founder and chief architect at BlueData) offers a detailed overview of how transparent data encryption works with HDFS, with a particular focus on containerized environments.
You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). Moreover, you’ll learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63763
Hadoop security has improved with additions such as HDFS ACLs, Hive column-level ACLs, HBase cell-level ACLs, and Knox for perimeter security. Data encryption has also been enhanced, with support for encrypting data in transit using SSL and data at rest through file encryption or the upcoming native HDFS encryption. Authentication is provided by Kerberos/AD with token-based authorization, and auditing tracks who accessed what data.
AWS Summit 2014 Melbourne - Breakout 2
Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.
Presenter: Peter Kerney, Senior Solution Architect, Intel
VMworld 2013
Chris Greer, FedEx
Richard McDougall, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
(Presented by Intel) This is the best of times and the worst of times for cloud services developers. At no other time in history has open access to data, open interfaces to data analytics, and open licensing of source code come together with scalable, cost-effective, cloud infrastructures. This is the good news.
The bad news is that enterprises are being left behind. Stymied by concerns of data protection and data governance, enterprises need proof that the services and solutions built on a cloud infrastructure comply with policies and practices they’ve come to learn (not necessarily love). At its heart is the root of trust issue – how far down can I trust the cloud service, its infrastructure software, and the data that it analyzes? And how do I know my keys are safe? Join this session to learn how Intel has been enabling trusted analytics with cloud services secured top to bottom – from Apache Hadoop to Java, Xen, and Linux – without compromising security.
DAOS (Distributed Application Object Storage) is a high-performance storage architecture and software stack that delivers scalable object storage capabilities. It uses Intel Optane memory and NVMe SSDs to provide high IOPS, bandwidth, and low latency storage. DAOS supports various data models and interfaces like POSIX, HDF5, Spark, and Python. It allows applications to access storage with library calls instead of system calls for high performance.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
This document provides an overview of new security features in Hortonworks Data Platform (HDP) 2.1, including the Knox gateway for securing Hadoop REST APIs, extended access control lists (ACLs) in HDFS, and Apache Hive authorization using ATZ-NG. Knox provides a single access point and central security for REST APIs. Extended HDFS ACLs allow assigning different permissions to users and groups. Hive ATZ-NG implements SQL-style authorization with grants and revokes, integrating policies with the table lifecycle.
batbern43 Self Service on a Big Data PlatformBATbern
Kafka has been used for several years at Swisscom to stream data from various sources to sinks such as Hadoop. Providing Kafka and Hadoop as a Service to multiple teams in a large company presents governance, security and multi-tenancy challenges. In this talk we will present how we have built our self-service Swisscom Big Data Platform which enables teams to use Kafka, Hadoop and Kubernetes internally. We will explain how we have tackled these challenges by describing our governance model, our identity & ACLs management, and our self-service capabilities. We will also present how we leverage Kubernetes and how it simplifies our operations.
В связи с ростом трафика и необходимостью объемного анализа данных, большие данные стали одной из самых популярных областей в сфере IT, и многие компании в настоящее время работают над этим вопросом — развертывают кластеры проекта Hadoop, который в настоящее время является самой популярной платформой для обработки больших данных. В докладе в доступной форме будут представлены вопросы обеспечения безопасности Hadoop или, точнее, их принципы, а также продемонстрированы различные векторы атак на кластер.
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...PROIDEA
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers? In this presentation I would like to show that penetration testing of Hadoop installation does not really differ much from any other application. Apart from complexity of the installation and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security, encryption at rest, obsolete libraries bugs and least privilege principle. We tested popular Hadoop environments and found few critical vulnerabilities, which for sure cast a shadow on big data security. So as not to stop with CVE shooting, we would like to show you our approach of testing big data installations and few ideas of how to keep them secure.
Similar to Open Source Security Tools for Big Data (20)
Breaking Free from Proprietary Gravitational PullGreat Wide Open
This document provides an overview and agenda for a presentation about breaking free from proprietary software and embracing open source. The presentation covers the business and legal considerations for open sourcing existing software projects, including ownership models, licensing strategies, and governance approaches. It also addresses how to structure R&D, sales, and support organizations to be successful with open source and how to build and invest in developer and user communities. The goal is to help companies chart a course to transition existing proprietary software to open source models and practices.
You Don't Know Node: Quick Intro to 6 Core FeaturesGreat Wide Open
This document provides an introduction to Node.js and discusses its core features including:
- Node.js is asynchronous and event-driven, allowing it to handle multiple requests simultaneously without blocking.
- It uses a single thread model with non-blocking I/O, utilizing an event loop to process tasks in parallel.
- Common data types like streams and buffers are used to handle binary data and large files efficiently without blocking the thread.
Andy Watson, an employee of Ionic Security, gave a presentation on properly using cryptography in applications. The presentation covered topics such as random number generation, hashing, salting passwords, key derivation functions, symmetric encryption algorithms and common mistakes made with cryptography. The goal was to help people avoid vulnerabilities like unsalted hashes, hardcoded keys, weak random number generation and improper encryption modes.
Lightning Talk - Getting Students Involved In Open SourceGreat Wide Open
Lightning Talks are presented by Opensource.com
Chris Aniszczyk
Executive Director (interim)
Cloud Native Computing Foundation
Great Wide Open 2016
Atlanta, GA
March 17th, 2016
The document discusses test automation using Selenium and provides guidance on best practices. It covers topics like test design approaches, automation-friendly test techniques, special test cases for things like data and graphics, and perspectives on test automation. The document also discusses test frameworks, libraries and patterns commonly used with Selenium. It provides examples of keyword-driven and behavior-driven test automation using domain-specific languages.
The document discusses how constraints can cultivate growth. It suggests 5 ways that constraints can help: 1) use fewer resources, 2) create regulations, 3) remove distractions, 4) self-organize, and 5) stretch your comfort zone. Constraints shape problems and provide clear challenges to overcome, helping to make decisions, improve experiences, increase productivity, work together, and grow and learn.
The document discusses best practices for running MySQL on Linux, covering choices for Linux distributions, hardware recommendations including using solid state drives, OS configuration such as tuning the filesystem and IO scheduler, and MySQL installation and configuration options. It provides guidance on topics like virtualization, networking, and MySQL variants to help ensure successful and high performance deployment of MySQL on Linux.
This document discusses search interfaces and principles. It begins with an introduction to the presenter and then covers topics like how search engines work, principles of good search design, and common front-end search patterns. Specific concepts discussed include indexing text, query analysis, scoring and ranking documents, filtering results, aggregations, autocomplete, highlighting search terms, and loading more results. The overall message is that search provides a powerful and flexible way to return relevant content to users.
This document provides an overview of open source software. It defines open source as software that is freely available with its source code and allows others to use, modify, and distribute the software. It discusses the main open source licenses like permissive, weak copyleft, and strong copyleft licenses. It also covers the different types of open source community governance models like walled gardens, benevolent dictators, and meritocracies. Finally, it provides tips for building open source communities through email lists, consensus, positivity, and sharing.
This document discusses principles of antifragile design. It emphasizes designing for diversity among users by understanding different mindsets and contexts through user research and data. It stresses iterating quickly based on feedback, sharing work publicly in early stages, and embracing uncertainty. Well-designed systems can evolve and adapt to users' changing needs over time by deciding on defaults instead of excessive options and customization.
This document discusses using Elasticsearch for SQL users. It covers search queries, data modeling, and architecture approaches. The agenda includes search queries, data modeling, and architecture. A live demo shows searching a single field, multiple fields, and phrases. Data modeling discusses analyzing or not analyzing fields. Relationships can be modeled through application joins, data denormalization, nested objects, or parent-child documents. Architecture approaches include using triggers, asynchronous replication, and forked writes from applications with or without Logstash.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
2. 2
# whoami
Global Security SME Lead @hortonworks
Senior Solutions Engineer @hortonworks
Book Author – Virtualizing Hadoop
Co-organizer of Atlanta Hadoop User Group
Regular Speaker at Big Data Conferences
4. 4
DATA – More Volume and More Types
I N C R E A S I N G D ATA V A R I E T Y A N D C O M P L E X I T Y
USER GENERATED CONTENT
MOBILE WEB
SMS/MMS
SENTIMENT
EXTERNAL
DEMOGRAPHICS
HD VIDEO
SPEECH TO TEXT
PRODUCT/
SERVICE LOGS
SOCIAL NETWORK
BUSINESS
DATA FEEDS
USER CLICK STREAM
WEB LOGS
OFFER HISTORY DYNAMIC PRICING
A/B TESTING
AFFILIATE NETWORKS
SEARCH MARKETING
BEHAVIORAL TARGETING
DYNAMIC FUNNELSPAYMENT
RECORD
SUPPORT
CONTACTS
CUSTOMER
TOUCHESPURCHASE DETAIL
PURCHASE
RECORD
SEGMENTATIONOFFER DETAILS
P E TA BY T E S
T E R A BY T E S
G I G A BY T E S
E X A BY T E S
E R P
BIG DATA
WEB
CR M
5. 5
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modeling
Fraud detection
Compliance (AML, KYC)
Bank 3.0
Information security
Single view of customer
Trading applications
Market data management
ANALYSIS & VISUALIZATION
Security
Operations
Governance
&Integration
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS
(Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column
Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS
WORD
DOCUMENTS
LOCATION DATA SENSOR DATA
CUSTOMER
SENTIMENT
RESEARCH
REPORTS
6. 6
• HIPAA - Health Insurance Portability and Accountability Act of 1996
• HITECH - The Health Information Technology for Economic and Clinical Health Act
• PCI DSS - Payment Card Industry Data Security Standard
• SOX - The Sarbanes-Oxley Act of 2003
• ISO - International Organization Standardization
• COBIT - Control Objectives for Information and Related Technology
• Corporate Security Policies
Compliance Adherences
12. 12
Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter
13. 13
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master Nodes
Rack 1
Switch
NN
SNN
….
Slave Nodes
Rack 2
….
Slave Nodes
Rack N
SwitchSwitch
DN DN
Web Tier
LB
Knox
Hadoop CLIs
14. 14
REST API
Hadoop
Services
What does Perimeter Security really mean?
Gateway
Firewall
User
Firewall
required at
perimeter
(today)
Knox Gateway
controls all
Hadoop REST API
access through
firewall
Hadoop
cluster
mostly
unaffected
Firewall only allows
connections
through specific
ports from Knox
host
Hive Host
HBase Host
WebHDFS
HBase Host
HBase Host
20. 20
Kerberos Primer
Page 20
Client
KDC
NN
DN
1. kinit - Login and get Ticket Granting Ticket (TGT)
3. Get NameNode Service Ticket (NN-ST)
2. Client Stores TGT in Ticket Cache
4. Client Stores NN-ST in Ticket Cache
5. Read/write file given NN-ST and
file name; returns block locations,
block IDs and Block Access Tokens
if access permitted
6. Read/write block given
Block Access Token and block ID
Client’s
Kerberos Ticket
Cache
23. 23
Sample Simplified Workflow - HDFS
Policy
Manager
Plugin
Admin sets policies for HDFS
files/folder
Data scientist runs a
map reduce job
User
Application
Users access HDFS data through
application Name Node
IT users access
HDFS through CLI
Namenode uses
Plugin for
Authorization
Audit
Database Audit logs pushed to DB
Namenode provides
resource access to
user/client
1
2
2
2
3
4
5
24. 24
Ranger Stacks
• Apache Ranger v0.5 supports stack-model to enable easier onboarding of new
components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin
• Create a JSON file with following
details :
- Resources
- Access types
- Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component.
• During initialization of the service: Init RangerBasePlugIn &
RangerDefaultAuditHandler class.
• To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed()
• To support resource lookup: Implement
RangerBaseService.lookupResource() &
RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
26. 26
Data Protection
Hadoop allows you to apply data protection policy at
two different layers across the Hadoop stack
Layer What? How ?
Storage Encrypt data in disk
Volume level: LUKS (Linux), BitLocker (Windows)
Native in Hadoop: HDFS Encryption
Partners: Voltage, Protegrity, DataGuise, Vormetric
OS level encrypt
Transmission Encrypt data as it moves
Native in Hadoop: SSL & SASL
AES 256 for SSL & DTP with SASL
28. 28
1
°
°
°
°
° °
° °
° °
° °
° N°
HDFS Encryption – How it works
DATA ACCESS
DATA MANAGEMENT
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
SECURITY
YARN
HDFS Client
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°HDFS
(Hadoop Distributed File System)
Encryption Zone
(attributes - EZKey ID, version)
HDFS-6134
Encrypted File
(attributes - EDEK, IV)
Name Node
KeyProvider
API
KeyProvider
API
Key Management
System (KMS)
Hadoop-10433
KeyProvider API –
Hadoop-10141
EDEK
DEK
Crypto Stream
(r/w with DEK)
DEKs EZKs
Acronym Description
EZ Encryption Zone (an HDFS directory)
EZK Encryption Zone Key; master key associated with all
files in an EZ
DEK Data Encryption Key, unique key associated with each
file. EZ Key used to generate DEK
EDEK Encrypted DEK, Name Node only has access to
encrypted DEK.
IV Initialization Vector
EDEK
EDEK
29. 29
As HDFS
Admin
HDFS Encryption – Common Commands
• Run KMS Server
– ./kms.sh run
• Create Encryption Key
– hadoop key create key1 -size 128
– # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file.
• List all Encryption Keys
– hadoop key list –metadata
• As an Admin(hdfs user) create an encryption Zone
– hdfs crypto -createZone -keyName key1 -path /secure1
– Point to an existing & empty directory
• List all Encryption Zones
– hdfs crypto –listZones
• Read/Write to HDFS unchanged
– hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1
– hdfs dfs -cat /securehive/sal.txt
Run this as user not in HDFS admin role
As HDFS
End-user
30. 30
Encrypting Data In-Motion
Page 30
Protocol Communication Point Encryption Mechanism
• REST • WebHDFS (Client to Cluster)
• Client to Knox
• REST over SSL
• Knox Gateway SSL
• SPNEGO - provides a mechanism for extending Kerberos to
Web applications through the standard HTTP protocol
• HTTP • NameNode/JobTracker UI
• MapReduce Shuffle
• HTTPS
• Encrypted MapReduce Shuffle (MAPREDUCE-4117)
• RPC • Hadoop Client (Client to
Cluster, Intra-Cluster)
• SASL – The Hadoop RPC system implements SASL which
provides different QoP including encryption
• JDBC/ODBC • HiveServer2 • SSL
• TCP/IP • Data Transfer (Client to
Cluster, Intra-Cluster)
• Encrypted DataTransfer Protocol available in Hadoop
• Adding SASL support to the DataTransferProtocol