NOTE: Slides contains gifs which may appear as dark pics.
You got your cluster installed and configured. You celebrate, until the party is ruined by your company's Security officer stamping a big "Deny" on your Hadoop cluster. And oops!! You cannot place any data onto the cluster until you can demonstrate it is secure. In this session you will learn the tips and tricks to fully secure your cluster for data at rest, data in motion and all the apps including Spark. Your Security officer can then join your Hadoop revelry (unless you don't authorize him to, with your newly acquired admin rights)
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
As Hadoop becomes a critical part of Enterprise data infrastructure, securing Hadoop has become critically important. Enterprises want assurance that all their data is protected and that only authorized users have access to the relevant bits of information. In this session we will cover all aspects of Hadoop security including authentication, authorization, audit and data protection. We will also provide demonstration and detailed instructions for implementing comprehensive Hadoop security.
Abstract:
As organizations start to roll out or migrate data driven applications to Apache Hadoop, there are times when they have conflicting needs to leverage their full co-mingled data sets in Hadoop
while providing isolation of sections of such co-mingled data to a specific customer. Serving multiple customers in this manner is a typical multi-tenant usecase and one that can be challenging in Apache Hadoop.
This presentation walks through a number of patterns that can be leveraged for providing isolation of tenants based on the composability of Apache Knox for:
* Authentication/Federation Providers
* KnoxSSO
* Identity Assertion
* Tenant specific topologies
With these patterns, Knox can provide an infrastructure for robust tenant isolation and access control for application UIs and REST APIs for your data landscape, when suitably coupled with a cluster that has carefully considered infrastructure including:
* Kerberos
* Tenant specific user accounts, OUs and Groups within LDAP
* Authorization Policy that is aware of the tenant specific groups,
Summary:
We will walk through some of the patterns that have been used to enable such a multi-tenant environment as well as the specific considerations for topology, access control and user accounts involved with creating such an environment.
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
As Hadoop becomes a critical part of Enterprise data infrastructure, securing Hadoop has become critically important. Enterprises want assurance that all their data is protected and that only authorized users have access to the relevant bits of information. In this session we will cover all aspects of Hadoop security including authentication, authorization, audit and data protection. We will also provide demonstration and detailed instructions for implementing comprehensive Hadoop security.
Abstract:
As organizations start to roll out or migrate data driven applications to Apache Hadoop, there are times when they have conflicting needs to leverage their full co-mingled data sets in Hadoop
while providing isolation of sections of such co-mingled data to a specific customer. Serving multiple customers in this manner is a typical multi-tenant usecase and one that can be challenging in Apache Hadoop.
This presentation walks through a number of patterns that can be leveraged for providing isolation of tenants based on the composability of Apache Knox for:
* Authentication/Federation Providers
* KnoxSSO
* Identity Assertion
* Tenant specific topologies
With these patterns, Knox can provide an infrastructure for robust tenant isolation and access control for application UIs and REST APIs for your data landscape, when suitably coupled with a cluster that has carefully considered infrastructure including:
* Kerberos
* Tenant specific user accounts, OUs and Groups within LDAP
* Authorization Policy that is aware of the tenant specific groups,
Summary:
We will walk through some of the patterns that have been used to enable such a multi-tenant environment as well as the specific considerations for topology, access control and user accounts involved with creating such an environment.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
Securing Hadoop's REST APIs with Apache Knox Gateway
Presented at Hadoop Summit on June 6th, 2014
Describes the overall roles the Apache Knox Gateway plays in Hadoop security and briefly covers its primary features.
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
Apache Hive is the defacto standard for SQL queries over petabytes of data in Hadoop. It is a comprehensive and compliant engine that offers the broadest range of SQL semantics for Hadoop, providing a powerful set of tools for analysts and developers to access Hadoop data. The session will cover the latest advancements in Hive and provide practical tips for maximizing Hive Performance.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=7c8f800cbbef256680db14c78b871f97
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
Today enterprises are increasingly leveraging hybrid cloud data lakes while taking advantage of the elastic resources and services available in the public cloud. However, such gains come with risks and challenges in the areas of security and privacy. In this talk, we will cover how an enterprise can use Apache Knox as a secure point of entry into a Multi-cluster hybrid cloud data lakes. We will outline how enterprises can securely test out new big data applications or concepts in the public cloud while protecting their production clusters on-premises. We will show how enterprises can leverage their existing on-premises Active Directory infrastructure for authenticating users trying to access their services in the cloud. Further, we will cover how you can leverage Apache Knox Authorization to thwart an unauthorized access to a multi-cloud and multi-cluster data lake and bring to bear Multi Factor Authentication (MFA) on Apache Knox to block a hacker with stolen credentials. KIRAN MATTY, Senior Product Manager, Hortonworks and SANDEEP MORE, Sr. Software Engineer, Hortonworks
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...huguk
Olivier from Hortonworks will introduce Apache Argus a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
Data security within Hadoop has evolved to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. With the advent of Apache YARN and Argus, the Hadoop platform can now support a true secure data lake architecture.
Nowadays a typical Hadoop deployment consists of core Hadoop components – HDFS and MapReduce – several other components such as HBase, HttpFS, Oozie, Pig, Hive, Sqoop, Flume, plus programmatic integration from external systems and applications. This effectively creates a complex and heterogenous distributed environment that runs across several machines and uses different protocols to communicate with each other; all of which is used concurrently by several users and applications. When a Hadoop deployment and its ecosystem is used to process sensitive data (such as financial records, payment transactions, healthcare records), several security requirements arise. These security requirements may be dictated by internal policies and/or government regulations. They may require strong authentication, selective authorization to access data/resources, and data confidentiality. This session covers in detail how different components in the Hadoop ecosystem and external applications can interact with each other in a secure manner providing authentication, authorization, and confidentiality when accessing services and transferring data to/from/between services. The session will cover topics like Kerberos authentication, Web UI authentication, File System permissions, delegation tokens, Access Control Lists, ProxyUser impersonation and network encryption.
Many enterprise are implementing Hadoop projects to manage and process large datasets. Big question is: how to configure Hadoop clusters to connect to enterprise directory containing 100k+ users and groups for access management. Several large enterprises have complex directory servers for managing users and groups. Many advanced features have been recently added to Hadoop user management in order to support various complex directory server structures.
In this session attendees will learn about: setting up Hadoop node with users from Active Directory for executing Hadoop jobs, setting up authentication for enterprise users, and setting up authorization for users and groups using Apache Ranger. Attendees will also learn about the common challenges faced in the enterprise environments while interacting with Active Directory including filtering out users to be brought into Hadoop from Active Directory, restricting access to a set of users from Active Directory, handling users from nested group structures, etc.
Speakers
Sailaja Polavarapu, staff Software Engineer, Hortonworks
Velmurugan Periasamy, Director - Engineering, Hortonworks
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
Join this webinar to explore Hadoop security challenges and trends, learn how to simply the connection of your Hortonworks Data Platform to your existing Active Directory infrastructure and hear about real world examples of organizations that are achieving the following benefits:
- Secured Hortonworks environments thanks to Active Directory infrastructure for identity and authentication.
- Increased productivity and security via single sign-on for IT admins and Hadoop users.
- Least privilege and session monitoring for privileged access to Hortonworks clusters.
Webinar URL: http://hortonworks.com/webinar/simplify-and-secure-your-hadoop-environment-with-hortonworks-and-centrify/
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
Rich media is exploding all around us. From our personal usage to retailers monitoring store traffic for optimized associate placement, there is wide and growing application of rich media. Despite the pervasive usage, enterprises have had limited choice of generally available tools to analyze rich media. In this session we will look into leveraging deep learning algorithms for rich media analysis and provide practical hands on example of image recognition using Apache Hadoop and Spark.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
Securing Hadoop's REST APIs with Apache Knox Gateway
Presented at Hadoop Summit on June 6th, 2014
Describes the overall roles the Apache Knox Gateway plays in Hadoop security and briefly covers its primary features.
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
Apache Hive is the defacto standard for SQL queries over petabytes of data in Hadoop. It is a comprehensive and compliant engine that offers the broadest range of SQL semantics for Hadoop, providing a powerful set of tools for analysts and developers to access Hadoop data. The session will cover the latest advancements in Hive and provide practical tips for maximizing Hive Performance.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=7c8f800cbbef256680db14c78b871f97
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
Today enterprises are increasingly leveraging hybrid cloud data lakes while taking advantage of the elastic resources and services available in the public cloud. However, such gains come with risks and challenges in the areas of security and privacy. In this talk, we will cover how an enterprise can use Apache Knox as a secure point of entry into a Multi-cluster hybrid cloud data lakes. We will outline how enterprises can securely test out new big data applications or concepts in the public cloud while protecting their production clusters on-premises. We will show how enterprises can leverage their existing on-premises Active Directory infrastructure for authenticating users trying to access their services in the cloud. Further, we will cover how you can leverage Apache Knox Authorization to thwart an unauthorized access to a multi-cloud and multi-cluster data lake and bring to bear Multi Factor Authentication (MFA) on Apache Knox to block a hacker with stolen credentials. KIRAN MATTY, Senior Product Manager, Hortonworks and SANDEEP MORE, Sr. Software Engineer, Hortonworks
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...huguk
Olivier from Hortonworks will introduce Apache Argus a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
Data security within Hadoop has evolved to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. With the advent of Apache YARN and Argus, the Hadoop platform can now support a true secure data lake architecture.
Nowadays a typical Hadoop deployment consists of core Hadoop components – HDFS and MapReduce – several other components such as HBase, HttpFS, Oozie, Pig, Hive, Sqoop, Flume, plus programmatic integration from external systems and applications. This effectively creates a complex and heterogenous distributed environment that runs across several machines and uses different protocols to communicate with each other; all of which is used concurrently by several users and applications. When a Hadoop deployment and its ecosystem is used to process sensitive data (such as financial records, payment transactions, healthcare records), several security requirements arise. These security requirements may be dictated by internal policies and/or government regulations. They may require strong authentication, selective authorization to access data/resources, and data confidentiality. This session covers in detail how different components in the Hadoop ecosystem and external applications can interact with each other in a secure manner providing authentication, authorization, and confidentiality when accessing services and transferring data to/from/between services. The session will cover topics like Kerberos authentication, Web UI authentication, File System permissions, delegation tokens, Access Control Lists, ProxyUser impersonation and network encryption.
Many enterprise are implementing Hadoop projects to manage and process large datasets. Big question is: how to configure Hadoop clusters to connect to enterprise directory containing 100k+ users and groups for access management. Several large enterprises have complex directory servers for managing users and groups. Many advanced features have been recently added to Hadoop user management in order to support various complex directory server structures.
In this session attendees will learn about: setting up Hadoop node with users from Active Directory for executing Hadoop jobs, setting up authentication for enterprise users, and setting up authorization for users and groups using Apache Ranger. Attendees will also learn about the common challenges faced in the enterprise environments while interacting with Active Directory including filtering out users to be brought into Hadoop from Active Directory, restricting access to a set of users from Active Directory, handling users from nested group structures, etc.
Speakers
Sailaja Polavarapu, staff Software Engineer, Hortonworks
Velmurugan Periasamy, Director - Engineering, Hortonworks
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
Join this webinar to explore Hadoop security challenges and trends, learn how to simply the connection of your Hortonworks Data Platform to your existing Active Directory infrastructure and hear about real world examples of organizations that are achieving the following benefits:
- Secured Hortonworks environments thanks to Active Directory infrastructure for identity and authentication.
- Increased productivity and security via single sign-on for IT admins and Hadoop users.
- Least privilege and session monitoring for privileged access to Hortonworks clusters.
Webinar URL: http://hortonworks.com/webinar/simplify-and-secure-your-hadoop-environment-with-hortonworks-and-centrify/
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
Rich media is exploding all around us. From our personal usage to retailers monitoring store traffic for optimized associate placement, there is wide and growing application of rich media. Despite the pervasive usage, enterprises have had limited choice of generally available tools to analyze rich media. In this session we will look into leveraging deep learning algorithms for rich media analysis and provide practical hands on example of image recognition using Apache Hadoop and Spark.
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
This presentation was included in a 30-minute webinar Balaji Ganesan, Hortonworks senior director for enterprise security strategy and Vinay Shukla, director of product management.
They discussed Hortonworks Data Platform 2.2’s features for delivering comprehensive security in HDP.
Balaji and Vinay discussed Apache Ranger and Apache Knox and how they are integrated in HDP 2.2 to provide fine grain authorization, auditing and API security that can be centrally administered.
Hadoop and Data Virtualization - A Case Study by VHAHortonworks
VHA (Voluntary Hospitals of America) is the largest member-owned health care company in the US delivering industry-leading supply chain management services and clinical improvement services to its members. At VHA, product, supplier, and member information is siloed across multiple sources. VHA sees value in consolidating the disparate data into a Data Lake, supported by the Hortonworks Data Platform, to enable the business users to discover the related data and provide services to their members. Because of their previous success with data virtualization, powered by Denodo, VHA decided to use data virtualization to enable their business users to discover data using the familiar SQL, and thus abstract their access directly to Hadoop.
During this webinar, you will learn:
- The role, use, and benefits of Hadoop in the Modern Data Architecture.
- How Hadoop and data virtualization simplified data management and enabled faster data discovery.
- What data virtualization is and how it can simplify big data projects.
- Lessons learned from and best practices for deploying data lake and data virtualization.
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
Enabling the Real Time Analytical EnterpriseHortonworks
Combining IOT, Customer Experience and Real-Time Enterprise Data within Hadoop. What if you could derive real-time insights using ALL of your data? Join us for this webinar and learn how companies are combining “new” real-time data sources (i.e. IOT, Social, Web Logs) with continuously updated enterprise data from SAP and other enterprise transactional systems, providing deep and up-to-the-second analytical insights. This presentation will include a demonstration of how this can be achieved quickly, easily and affordably by utilizing a joint solution from Attunity and Hortonworks.
You got your cluster installed and configured. You celebrate, until the party is ruined by your company's Security officer stamping a big "Deny" on your Hadoop cluster. And oops!! You cannot place any data onto the cluster until you can demonstrate it is secure In this session you would learn the tips and tricks to fully secure your cluster for data at rest, data in motion and all the apps including Spark. Your Security officer can then join your Hadoop revelry (unless you don't authorize him to, with your newly acquired admin rights)
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
Most enterprises with large data lakes today are flying blind when it comes to the extent to which they can understand how the data in their data lakes is organized, accessed, and utilized to create real business value. Couple this with the need to democratize data, enterprises often realize they have created a data swamp loaded with all kinds of data assets without any curation and without appropriate security controls hoping that developers and analysts can responsibly collaborate to generate insights. In this talk we will provide a broad overview of how organizations can use open source frameworks such as Apache Ranger and Apache Knox to secure their data lakes and Apache Atlas to effectively provide open metadata and governance services for Hadoop ecosystem. We will provide an overview of the new features that have been added in each of these Apache projects recently and how enterprises can leverage these new features to build a robust security and governance model for their data lakes.
Speaker
Owen O'Malley, Co-Founder & Technical Fellow, Hortonworks
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud.
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud.
Speaker:
Jeff Sposetti, Product Management, Hortonworks
This is the presentation from the "Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS" webinar on May 28, 2014. Rohit Bahkshi, a senior product manager at Hortonworks, and Vinod Vavilapalli, PMC for Apache Hadoop, discuss an overview of YARN in HDFS and new features in HDP 2.1. Those new features include: HDFS extended ACLs, HTTPs wire encryption, HDFS DataNode caching, resource manager high availability, application timeline server, and capacity scheduler pre-emption.
This talk discusses the current status of Hadoop security and some exciting new security features that are coming in the next release. First, we provide an overview of current Hadoop security features across the stack, covering Authentication, Authorization and Auditing. Hadoop takes a “defense in depth” approach, so we discuss security at multiple layers: RPC, file system, and data processing. We provide a deep dive into the use of tokens in the security implementation. The second and larger portion of the talk covers the new security features. We discuss the motivation, use cases and design for Authorization improvements in HDFS, Hive and HBase. For HDFS, we describe two styles of ACLs (access control lists) and the reasons for the choice we made. In the case of Hive we compare and contrast two approaches for Hive authrozation.. Further we also show how our approach lends itself to a particular initial implementation choice that has the limitation where the Hive Server owns the data, but where alternate more general implementation is also possible down the road. In the case of HBase, we describe cell level authorization is explained. The talk will be fairly detailed, targeting a technical audience, including Hadoop contributors.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
Keynote slides from Big Data Spain Nov 2016. Has some thoughts on how Hadoop ecosystem is growing and changing to support the enterprise, including Hive, Spark, NiFi, security and governance, streaming, and the cloud.
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
Beginning with HDP 2.1, Hortonworks Data Platform ships with Apache Falcon for Hadoop data governance. Himanshu Bari, Hortonworks senior product manager, and Venkatesh Seetharam, Hortonworks co-founder and committer to Apache Falcon, lead this 30-minute webinar, including:
+ Why you need Apache Falcon
+ Key new Falcon features
+ Demo: Defining data pipelines with replication; policies for retention and late data arrival; managing Falcon server with Ambari
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Big data security challenges are bit different from traditional client-server applications and are distributed in nature, introducing unique security vulnerabilities. Cloud Security Alliance (CSA) has categorized the different security and privacy challenges into four different aspects of the big data ecosystem. These aspects are infrastructure security, data privacy, data management and, integrity and reactive security. Each of these aspects are further divided into following security challenges:
1. Infrastructure security
a. Secure distributed processing of data
b. Security best practices for non-relational data stores
2. Data privacy
a. Privacy-preserving analytics
b. Cryptographic technologies for big data
c. Granular access control
3. Data management
a. Secure data storage and transaction logs
b. Granular audits
c. Data provenance
4. Integrity and reactive security
a. Endpoint input validation/filtering
b. Real-time security/compliance monitoring
In this talk, we are going to refer above classification and identify existing security controls, best practices, and guidelines. We will also paint a big picture about how collective usage of all discussed security controls (Kerberos, TDE, LDAP, SSO, SSL/TLS, Apache Knox, Apache Ranger, Apache Atlas, Ambari Infra, etc.) can address fundamental security and privacy challenges that encompass the entire Hadoop ecosystem. We will also discuss briefly recent security incidents involving Hadoop systems.
Speakers
Krishna Pandey, Staff Software Engineer, Hortonworks
Kunal Rajguru, Premier Support Engineer, Hortonworks
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Curb your insecurity with HDP - Tips for a Secure Cluster
1. Curb
Your
Insecurity
with
HDP
Tips
for
a
Secure
Cluster
(with
Spark
too)
Ancil
McBarneA
Senior
Solu*ons
Engineer
–
Security
&
Governance
Future
of
Data
Meetup
–
New
York
June
2nd,
2016