This document outlines an agenda for a presentation on Hadoop security. The agenda includes an overview of Hadoop, concepts of Hadoop security, Kerberos, Hadoop security design, and how to implement Hadoop security. The presenter is introduced as the lead of big data platform development at Gruter Inc. and an Apache Tajo committer.
- Kerberos is used to authenticate Hadoop services and clients running on different nodes communicating over a non-secure network. It uses tickets for authentication.
- Key configuration changes are required to enable Kerberos authentication in Hadoop including setting hadoop.security.authentication to kerberos and generating keytabs containing principal keys for HDFS services.
- Services are associated with Kerberos principles using keytabs which are then configured for use by the relevant Hadoop processes and services.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
In enterprise on-premises data center, we may have multiple Secured Hadoop clusters for different purpose. Sometimes, these Hadoop clusters might have different Hadoop distribution, Hadoop version, or even locat in different Data Center. To fulfill business requirement, data synchronize between these clusters could be an important mechanism. However, the story will be more complicated within the real world secured multi-cluster, compare to distcp between two same version and non-secured Hadoop clusters.
We would like to go through our experience on enable live data synchronization for mutiple kerberos enabled Hadoop clusters. Which include the functionality verification, multi-cluster configurations and automation setup process, etc. After that, we would share the use cases among those kerberos federated Hadoop clusters. Finally, provide our common practice on multi-cluster data synchronization.
Deploying enterprise grade security for Hadoop with Apache Sentry (incubating).
Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation).
In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).
В связи с ростом трафика и необходимостью объемного анализа данных, большие данные стали одной из самых популярных областей в сфере IT, и многие компании в настоящее время работают над этим вопросом — развертывают кластеры проекта Hadoop, который в настоящее время является самой популярной платформой для обработки больших данных. В докладе в доступной форме будут представлены вопросы обеспечения безопасности Hadoop или, точнее, их принципы, а также продемонстрированы различные векторы атак на кластер.
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...Lucidworks
This document discusses securing Apache Solr with Apache Sentry. It provides motivation for securing Solr, especially in multi-tenant Hadoop clusters. Sentry is chosen due to its established role in Hadoop security. Authentication is implemented using Kerberos and SPNego. Authorization is implemented at both the collection-level and document-level using role-based access control policies. Secure impersonation is also supported to allow applications to submit requests on behalf of users.
Improving HDFS Availability with Hadoop RPC Quality of ServiceMing Ma
Heavy users monopolizing cluster resources is a frequent cause of slowdown for others. With only one namenode and thousands of datanodes, any poorly written application is a potential distributed denial-of-service attack on namenode. In this talk, you will learn how to prevent slowdown from heavy users and poorly-written applications by enabling IPC Quality of Service (QoS), a new feature in Hadoop 2.6+. On Twitter’s and eBay’s production clusters, we’ve seen response times of 500 milliseconds with QoS off drop to 10 milliseconds with QoS on during heavy usage. We’ll cover how IPC QoS works and share our experience on how to tune performance.
- Kerberos is used to authenticate Hadoop services and clients running on different nodes communicating over a non-secure network. It uses tickets for authentication.
- Key configuration changes are required to enable Kerberos authentication in Hadoop including setting hadoop.security.authentication to kerberos and generating keytabs containing principal keys for HDFS services.
- Services are associated with Kerberos principles using keytabs which are then configured for use by the relevant Hadoop processes and services.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
In enterprise on-premises data center, we may have multiple Secured Hadoop clusters for different purpose. Sometimes, these Hadoop clusters might have different Hadoop distribution, Hadoop version, or even locat in different Data Center. To fulfill business requirement, data synchronize between these clusters could be an important mechanism. However, the story will be more complicated within the real world secured multi-cluster, compare to distcp between two same version and non-secured Hadoop clusters.
We would like to go through our experience on enable live data synchronization for mutiple kerberos enabled Hadoop clusters. Which include the functionality verification, multi-cluster configurations and automation setup process, etc. After that, we would share the use cases among those kerberos federated Hadoop clusters. Finally, provide our common practice on multi-cluster data synchronization.
Deploying enterprise grade security for Hadoop with Apache Sentry (incubating).
Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation).
In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).
В связи с ростом трафика и необходимостью объемного анализа данных, большие данные стали одной из самых популярных областей в сфере IT, и многие компании в настоящее время работают над этим вопросом — развертывают кластеры проекта Hadoop, который в настоящее время является самой популярной платформой для обработки больших данных. В докладе в доступной форме будут представлены вопросы обеспечения безопасности Hadoop или, точнее, их принципы, а также продемонстрированы различные векторы атак на кластер.
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...Lucidworks
This document discusses securing Apache Solr with Apache Sentry. It provides motivation for securing Solr, especially in multi-tenant Hadoop clusters. Sentry is chosen due to its established role in Hadoop security. Authentication is implemented using Kerberos and SPNego. Authorization is implemented at both the collection-level and document-level using role-based access control policies. Secure impersonation is also supported to allow applications to submit requests on behalf of users.
Improving HDFS Availability with Hadoop RPC Quality of ServiceMing Ma
Heavy users monopolizing cluster resources is a frequent cause of slowdown for others. With only one namenode and thousands of datanodes, any poorly written application is a potential distributed denial-of-service attack on namenode. In this talk, you will learn how to prevent slowdown from heavy users and poorly-written applications by enabling IPC Quality of Service (QoS), a new feature in Hadoop 2.6+. On Twitter’s and eBay’s production clusters, we’ve seen response times of 500 milliseconds with QoS off drop to 10 milliseconds with QoS on during heavy usage. We’ll cover how IPC QoS works and share our experience on how to tune performance.
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.
1) Hadoop is a framework for distributed processing of large datasets across clusters of computers using a simple programming model.
2) Virtualizing Hadoop enables rapid deployment, high availability, elastic scaling, and consolidation of big data workloads on a common infrastructure.
3) Serengeti is a tool that automates the deployment and management of Hadoop clusters on vSphere in under 30 minutes through simple commands.
Trend Micro uses Hadoop for processing large volumes of web data to quickly identify and block malicious URLs. They have expanded their Hadoop cluster significantly over time to support growing data and job volumes. They developed Hadooppet to automate deployment and management of their large, customized Hadoop distribution across hundreds of nodes. Profiling tools like Nagios, Ganglia and Splunk help monitor and troubleshoot cluster performance issues.
The document discusses 5 key things to know about administering MongoDB: 1) Understanding MongoDB's architecture and memory usage, 2) Protecting data through replication and deployment strategies, 3) Scaling writes and reads using sharding, 4) Monitoring MongoDB performance using tools like MMS, and 5) Backing up and restoring data with tools like mongodump/mongorestore. It provides examples and explanations of useful commands for tasks like replication, sharding, and monitoring MongoDB deployments.
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Jon Watte
These slides are the ones I presented at the 2011 Game Developer's Conference.
Social game and entertainment company IMVU built a real-time lightweight networked messaging back-end suitable for chat and social gaming. Here's how we did it!
This document discusses MongoDB best practices for deploying MongoDB in AWS. It begins with terminology comparing MongoDB and relational databases. It then shows an example data model in SQL and how that same data would be modeled in MongoDB. The document discusses concepts like cursors, indexing, and sharding in MongoDB. It emphasizes the importance of sizing RAM and disk appropriately based on working set size and data access patterns. Finally, it covers replication in MongoDB and different replication set topologies that can be used in AWS for high availability and disaster recovery.
Historically, security hasn't been a high priority in regards to Hadoop (reflection of type of data and organizations using Hadoop), but now Hadoop is being used by more traditional firms with heightened security requirements. MapR's Senior Principal Technologist, Keys Botzum, gives a talk on how you can build a secure cluster.
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Aerospike
From financial services, to digital advertising, omni-channel marketing and retail, companies are pushing to grow revenue by personalizing the customer experience in real-time based on knowing what they care about, where they are, and what they are doing now. For growing numbers of these businesses, this means developing applications that combine the historical analysis provided by Hadoop with real-time analysis through Storm and within NoSQL databases, themselves. This session will examine the design considerations and development approaches for successfully delivering interactive applications that incorporate real-time and batch analysis using a combination of Hadoop, Storm and NoSQL. Key topics will include:· A review of the respective roles that Hadoop, Storm and NoSQL databases play.· Considerations in choosing which technology to use in areas where their capabilities overlap.· An overview of a typical solution architecture.· Strategies for addressing the diverse data types required for providing a complete view of the customers.· Approaches to managing large data types to ensure reliable real-time responses.Throughout the discussion, concepts will be illustrated by use cases of businesses that have implemented real-time applications using Hadoop, Storm and NoSQL, which are in production today.
This presentation was given at the 2014 NoSQL Matters conference in Cologne, Germany.
An overview of the development of the Apache Hadoop software stack, including some of the barriers to participation -and how and why to overcome them. It closes with some open discussion points/ideas of how the existing process can be improved.
This document provides an overview of Hops, a next-generation distribution of Hadoop that uses a distributed database to store metadata externally from the NameNode. Key points:
- Hops stores HDFS metadata like inodes and block locations in NewSQL database NDB to allow the NameNode to scale beyond memory limits.
- HopsFS architecture moves metadata management to the database via a Data Abstraction Layer interface. This improves performance and makes the system more robust.
- HopsWorks is a frontend that enables true multi-tenancy, free text search across metadata, interactive analytics with Zeppelin/Flink/Spark and batch jobs.
- The distributed database provides a single source
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
Big Telco, Bigger real-time demands: Real-time processing in Telco
- Presented by Jung-ryong Lee, engineer manager at SK Telecom at Gruter TECHDAY 2014 Oct.29 Seoul, Korea
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
This document provides an introduction to Big Data and Apache Hadoop. It defines Big Data as large and complex datasets that are difficult to process using traditional database tools. It describes how Hadoop uses MapReduce and HDFS to provide scalable storage and parallel processing of Big Data. It provides examples of companies using Hadoop to analyze exabytes of data and common Hadoop use cases like log analysis. Finally, it summarizes some popular Hadoop ecosystem projects like Hive, Pig, and Zookeeper that provide SQL-like querying, data flows, and coordination.
The document describes OpenStack Trove, an OpenStack service that provides database as a service functionality. It discusses how Trove allows developers to provision and manage relational and non-relational databases in OpenStack clouds through self-service APIs. The document also provides an overview of how Trove works, how it is used in production environments today, and how users can get started with provisioning and managing databases using the Trove APIs and CLI tools.
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and the like.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
Content caching is one of the most effective ways to dramatically improve the performance of a web site. In this webinar, we’ll deep-dive into NGINX’s caching abilities and investigate the architecture used, debugging techniques and advanced configuration. By the end of the webinar, you’ll be well equipped to configure NGINX to cache content exactly as you need.
View full webinar on demand at http://nginx.com/resources/webinars/content-caching-nginx/
This document discusses the Hadoop cluster configuration at InMobi. It includes details about the cluster hardware specifications with 450 nodes and 5PB of storage. It also describes the software stack including Hadoop, Falcon, Oozie, Kafka and monitoring tools like Nagios and Graphite. The document then outlines some common issues faced like tasks hogging CPU resources and solutions implemented like cgroups resource limits. It provides examples of NameNode HA failover challenges and approaches to address slow running jobs.
As presented at LinuxCon/CloudOpen 2015 Seattle Washington, August 19th 2015. Sagi Brody & Logan Best
This session will focus on real world deployments of DDoS mitigation strategies in every layer of the network. It will give an overview of methods to prevent these attacks and best practices on how to provide protection in complex cloud platforms. The session will also outline what we have found in our experience managing and running thousands of Linux and Unix managed service platforms and what specifically can be done to offer protection at every layer. The session will offer insight and examples from both a business and technical perspective.
The document provides information about HDFS (Hadoop Distributed File System) including its design goals of storing large amounts of data reliably through horizontal scalability. It discusses HDFS configuration files and commands for interacting with HDFS through the hadoop fs command. The document also summarizes HDFS limitations and provides examples of using HDFS programmatically in Java.
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.
1) Hadoop is a framework for distributed processing of large datasets across clusters of computers using a simple programming model.
2) Virtualizing Hadoop enables rapid deployment, high availability, elastic scaling, and consolidation of big data workloads on a common infrastructure.
3) Serengeti is a tool that automates the deployment and management of Hadoop clusters on vSphere in under 30 minutes through simple commands.
Trend Micro uses Hadoop for processing large volumes of web data to quickly identify and block malicious URLs. They have expanded their Hadoop cluster significantly over time to support growing data and job volumes. They developed Hadooppet to automate deployment and management of their large, customized Hadoop distribution across hundreds of nodes. Profiling tools like Nagios, Ganglia and Splunk help monitor and troubleshoot cluster performance issues.
The document discusses 5 key things to know about administering MongoDB: 1) Understanding MongoDB's architecture and memory usage, 2) Protecting data through replication and deployment strategies, 3) Scaling writes and reads using sharding, 4) Monitoring MongoDB performance using tools like MMS, and 5) Backing up and restoring data with tools like mongodump/mongorestore. It provides examples and explanations of useful commands for tasks like replication, sharding, and monitoring MongoDB deployments.
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Jon Watte
These slides are the ones I presented at the 2011 Game Developer's Conference.
Social game and entertainment company IMVU built a real-time lightweight networked messaging back-end suitable for chat and social gaming. Here's how we did it!
This document discusses MongoDB best practices for deploying MongoDB in AWS. It begins with terminology comparing MongoDB and relational databases. It then shows an example data model in SQL and how that same data would be modeled in MongoDB. The document discusses concepts like cursors, indexing, and sharding in MongoDB. It emphasizes the importance of sizing RAM and disk appropriately based on working set size and data access patterns. Finally, it covers replication in MongoDB and different replication set topologies that can be used in AWS for high availability and disaster recovery.
Historically, security hasn't been a high priority in regards to Hadoop (reflection of type of data and organizations using Hadoop), but now Hadoop is being used by more traditional firms with heightened security requirements. MapR's Senior Principal Technologist, Keys Botzum, gives a talk on how you can build a secure cluster.
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Aerospike
From financial services, to digital advertising, omni-channel marketing and retail, companies are pushing to grow revenue by personalizing the customer experience in real-time based on knowing what they care about, where they are, and what they are doing now. For growing numbers of these businesses, this means developing applications that combine the historical analysis provided by Hadoop with real-time analysis through Storm and within NoSQL databases, themselves. This session will examine the design considerations and development approaches for successfully delivering interactive applications that incorporate real-time and batch analysis using a combination of Hadoop, Storm and NoSQL. Key topics will include:· A review of the respective roles that Hadoop, Storm and NoSQL databases play.· Considerations in choosing which technology to use in areas where their capabilities overlap.· An overview of a typical solution architecture.· Strategies for addressing the diverse data types required for providing a complete view of the customers.· Approaches to managing large data types to ensure reliable real-time responses.Throughout the discussion, concepts will be illustrated by use cases of businesses that have implemented real-time applications using Hadoop, Storm and NoSQL, which are in production today.
This presentation was given at the 2014 NoSQL Matters conference in Cologne, Germany.
An overview of the development of the Apache Hadoop software stack, including some of the barriers to participation -and how and why to overcome them. It closes with some open discussion points/ideas of how the existing process can be improved.
This document provides an overview of Hops, a next-generation distribution of Hadoop that uses a distributed database to store metadata externally from the NameNode. Key points:
- Hops stores HDFS metadata like inodes and block locations in NewSQL database NDB to allow the NameNode to scale beyond memory limits.
- HopsFS architecture moves metadata management to the database via a Data Abstraction Layer interface. This improves performance and makes the system more robust.
- HopsWorks is a frontend that enables true multi-tenancy, free text search across metadata, interactive analytics with Zeppelin/Flink/Spark and batch jobs.
- The distributed database provides a single source
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
Big Telco, Bigger real-time demands: Real-time processing in Telco
- Presented by Jung-ryong Lee, engineer manager at SK Telecom at Gruter TECHDAY 2014 Oct.29 Seoul, Korea
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
This document provides an introduction to Big Data and Apache Hadoop. It defines Big Data as large and complex datasets that are difficult to process using traditional database tools. It describes how Hadoop uses MapReduce and HDFS to provide scalable storage and parallel processing of Big Data. It provides examples of companies using Hadoop to analyze exabytes of data and common Hadoop use cases like log analysis. Finally, it summarizes some popular Hadoop ecosystem projects like Hive, Pig, and Zookeeper that provide SQL-like querying, data flows, and coordination.
The document describes OpenStack Trove, an OpenStack service that provides database as a service functionality. It discusses how Trove allows developers to provision and manage relational and non-relational databases in OpenStack clouds through self-service APIs. The document also provides an overview of how Trove works, how it is used in production environments today, and how users can get started with provisioning and managing databases using the Trove APIs and CLI tools.
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and the like.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
Content caching is one of the most effective ways to dramatically improve the performance of a web site. In this webinar, we’ll deep-dive into NGINX’s caching abilities and investigate the architecture used, debugging techniques and advanced configuration. By the end of the webinar, you’ll be well equipped to configure NGINX to cache content exactly as you need.
View full webinar on demand at http://nginx.com/resources/webinars/content-caching-nginx/
This document discusses the Hadoop cluster configuration at InMobi. It includes details about the cluster hardware specifications with 450 nodes and 5PB of storage. It also describes the software stack including Hadoop, Falcon, Oozie, Kafka and monitoring tools like Nagios and Graphite. The document then outlines some common issues faced like tasks hogging CPU resources and solutions implemented like cgroups resource limits. It provides examples of NameNode HA failover challenges and approaches to address slow running jobs.
As presented at LinuxCon/CloudOpen 2015 Seattle Washington, August 19th 2015. Sagi Brody & Logan Best
This session will focus on real world deployments of DDoS mitigation strategies in every layer of the network. It will give an overview of methods to prevent these attacks and best practices on how to provide protection in complex cloud platforms. The session will also outline what we have found in our experience managing and running thousands of Linux and Unix managed service platforms and what specifically can be done to offer protection at every layer. The session will offer insight and examples from both a business and technical perspective.
The document provides information about HDFS (Hadoop Distributed File System) including its design goals of storing large amounts of data reliably through horizontal scalability. It discusses HDFS configuration files and commands for interacting with HDFS through the hadoop fs command. The document also summarizes HDFS limitations and provides examples of using HDFS programmatically in Java.
OpenCV를 활용하는 영상처리 어플리케이션 개발자들은 항상 GPU 자원을 활용하고 싶을 것이다. 하지만 이기종 컴퓨팅 환경에서 CPU 이외의 다른 하드웨어 자원을 활용하는 것은 개발 환경 및 백그라운드 지식 등의 많은 어려움이 따른다.
GPGPU 활용에 가장 상용화로 성공한 대중적인 솔루션으로는 nVidia 사의 CUDA 기술이 있지만, 그 외에도 GPGPU 자원을 쉽게 활용할 수 있는 오픈 플랫폼이 있는데 이것이 OpenCL 표준이다.
최근 하드웨어와 소프트웨어 진영에서 모두 OpenCL의 지원 및 발전이 두드러지며 점점 더 확산되는 추세이다.
OpenCV 진영에서도, 3.0이 정식 릴리즈 면서 본격적으로 OpenCL을 활용하기가 한층 쉬워졌다.
This document provides an overview of Pig Latin for analyzing big data. It discusses what Pig Latin is, its architecture, programming with Pig Latin, and compares it to HiveQL and MapReduce. Pig Latin is a data flow language and compiler that generates MapReduce programs. It allows for easy programming, optimization opportunities, and extensibility. Programming in Pig Latin involves loading, working with (filtering, grouping, joining), and storing data. Additional topics covered include PiggyBank, Penny, Pig Mix, and uses of Pig Latin and HiveQL for structured vs unstructured data pre-processing.
오늘날 멀티코어 프로세서 세상은 이기종 컴퓨팅 환경이 대부분이라 해도 과언이 아니다.
병렬 컴퓨팅은 비약적인 속도 향상과 전력 소비 감소라는 장점이 있지만 사용하기가 까다롭고 특히 다양한 아키텍처로 이루어진 이기종 컴퓨팅 환경에서는 소프트웨어 개발이 더욱 어려워진다.
이 프리젠테이션에서는 이기종 컴퓨팅 환경에서의 병렬 처리를 위한 프로그래밍 언어를 소개하고 OpenCV와 같은 영상처리 라이브러리에서의 활용 예시를 보여준다
Facing enterprise specific challenges – utility programming in hadoopfann wu
This document discusses managing large Hadoop clusters through various automation tools like SaltStack, Puppet, and Chef. It describes how to use SaltStack to remotely control and manage a Hadoop cluster. Puppet can be used to easily deploy Hadoop on hundreds of servers within an hour through Hadooppet. The document also covers Hadoop security concepts like Kerberos and folder permissions. It provides examples of monitoring tools like Ganglia, Nagios, and Splunk that can be used to track cluster metrics and debug issues. Common processes like datanode decommissioning and tools like the HBase Canary tool are also summarized. Lastly, it discusses testing Hadoop on AWS using EMR and techniques to reduce EMR costs
Thousands of unsecured Hadoop clusters have been targets of attacks where criminals have deleted databases and files. According to reports, over 5,000 Hadoop installations were accessible on port 50070 without authentication, allowing attackers to destroy data nodes and snapshots containing terabytes of data within seconds. A study found nearly 4,500 servers with the Hadoop Distributed File System exposed over 5 petabytes of data. Many of these unsecured systems have likely already been compromised by attackers destroying data.
Big problems with big data – Hadoop interfaces securitySecuRing
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers?
Apart from complexity of Hadoop installations and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security and encryption at rest. We tested popular Hadoop environments and found a few critical vulnerabilities, which for sure cast a shadow on big data security.
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityJakub Kałużny
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers?
Apart from complexity of Hadoop installations and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security and encryption at rest. We tested popular Hadoop environments and found a few critical vulnerabilities, which for sure cast a shadow on big data security.
This document provides an overview of security topics related to Hadoop. It discusses what Hadoop is, common versions and distributions. It outlines some key security risks like default passwords, open ports, old versions with vulnerabilities. It also summarizes encryption options for data in motion and at rest, and security solutions like Knox and Ranger for centralized authorization policies.
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...PROIDEA
Did "cloud computing" and "big data" buzzwords bring new challenges for security testers? In this presentation I would like to show that penetration testing of Hadoop installation does not really differ much from any other application. Apart from complexity of the installation and number of interfaces, standard techniques can be applied to test for: web application vulnerabilities, SSL security, encryption at rest, obsolete libraries bugs and least privilege principle. We tested popular Hadoop environments and found few critical vulnerabilities, which for sure cast a shadow on big data security. So as not to stop with CVE shooting, we would like to show you our approach of testing big data installations and few ideas of how to keep them secure.
The document discusses virtualizing Hadoop clusters on VMware vSphere. It describes how Hadoop enables parallel processing of large datasets across clusters using MapReduce. Virtualizing Hadoop provides benefits like simple operations, high availability, and elastic scaling. The document outlines challenges with using Hadoop and how virtualization addresses them. It provides examples of deploying Hadoop clusters on Serengeti and configuring different distributions. Performance results show little overhead from virtualization and benefits of local storage. Joint engineering with Hortonworks adds high availability to Hadoop master daemons using vSphere features.
1) Hadoop is a framework for distributed processing of large datasets across clusters of computers using a simple programming model.
2) Virtualizing Hadoop enables rapid deployment, high availability, elastic scaling, and consolidation of big data workloads on a common infrastructure.
3) Serengeti is a tool that automates the deployment and management of Hadoop clusters on vSphere in under 30 minutes through simple commands.
Apache Ranger is a framework that can monitor and manage comprehensive data security across the entire Hadoop platform. It supports authorization, audit, and key management services for many Hadoop components, including HDFS, Yarn, HBase, Hive, Atlas, Kafka, Solr, Storm, Knox, KMS, Sqoop, and works with databases like Oracle, MySQL, PostgreSQL, and SQL Server. Ranger plugins need to be installed on specific component hosts, such as the NameNode for HDFS. The Ranger UI can then be used to create, edit, and manage data access policies across various components.
Nowadays a typical Hadoop deployment consists of core Hadoop components – HDFS and MapReduce – several other components such as HBase, HttpFS, Oozie, Pig, Hive, Sqoop, Flume, plus programmatic integration from external systems and applications. This effectively creates a complex and heterogenous distributed environment that runs across several machines and uses different protocols to communicate with each other; all of which is used concurrently by several users and applications. When a Hadoop deployment and its ecosystem is used to process sensitive data (such as financial records, payment transactions, healthcare records), several security requirements arise. These security requirements may be dictated by internal policies and/or government regulations. They may require strong authentication, selective authorization to access data/resources, and data confidentiality. This session covers in detail how different components in the Hadoop ecosystem and external applications can interact with each other in a secure manner providing authentication, authorization, and confidentiality when accessing services and transferring data to/from/between services. The session will cover topics like Kerberos authentication, Web UI authentication, File System permissions, delegation tokens, Access Control Lists, ProxyUser impersonation and network encryption.
The document discusses Hadoop security today and tomorrow. It describes the four pillars of Hadoop security as authentication, authorization, accountability, and data protection. It outlines the current security capabilities in Hadoop like Kerberos authentication and access controls, and future plans to improve security, such as encryption of data at rest and in motion. It also discusses the Apache Knox gateway for perimeter security and provides a demo of using Knox to submit a MapReduce job.
Hadoop security has improved with additions such as HDFS ACLs, Hive column-level ACLs, HBase cell-level ACLs, and Knox for perimeter security. Data encryption has also been enhanced, with support for encrypting data in transit using SSL and data at rest through file encryption or the upcoming native HDFS encryption. Authentication is provided by Kerberos/AD with token-based authorization, and auditing tracks who accessed what data.
The document discusses adding security features to Hadoop including authentication and authorization. Kerberos will be used for authentication and users will need to authenticate with Kerberos tickets to access HDFS and MapReduce. The APIs have been updated minimally to support the new security features and web UIs will also require authentication. Some remaining issues are lack of encryption of data in transit or at rest.
Similar to [2A5]하둡 보안 어떻게 해야 할까 (20)
The document discusses various machine learning clustering algorithms like K-means clustering, DBSCAN, and EM clustering. It also discusses neural network architectures like LSTM, bi-LSTM, and convolutional neural networks. Finally, it presents results from evaluating different chatbot models on various metrics like validation score.
The document discusses challenges with using reinforcement learning for robotics. While simulations allow fast training of agents, there is often a "reality gap" when transferring learning to real robots. Other approaches like imitation learning and self-supervised learning can be safer alternatives that don't require trial-and-error. To better apply reinforcement learning, robots may need model-based approaches that learn forward models of the world, as well as techniques like active localization that allow robots to gather targeted information through interactive perception. Closing the reality gap will require finding ways to better match simulations to reality or allow robots to learn from real-world experiences.
[243] Deep Learning to help student’s Deep LearningNAVER D2
This document describes research on using deep learning to predict student performance in massive open online courses (MOOCs). It introduces GritNet, a model that takes raw student activity data as input and predicts outcomes like course graduation without feature engineering. GritNet outperforms baselines by more than 5% in predicting graduation. The document also describes how GritNet can be adapted in an unsupervised way to new courses using pseudo-labels, improving predictions in the first few weeks. Overall, GritNet is presented as the state-of-the-art for student prediction and can be transferred across courses without labels.
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
This document provides a summary of new datasets and papers related to computer vision tasks including object detection, image matting, person pose estimation, pedestrian detection, and person instance segmentation. A total of 8 papers and their associated datasets are listed with brief descriptions of the core contributions or techniques developed in each.
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
The document discusses linear algebra concepts including:
- Representing a system of linear equations as a matrix equation Ax = b where A is a coefficient matrix, x is a vector of unknowns, and b is a vector of constants.
- Solving for the vector x that satisfies the matrix equation using linear algebra techniques such as row reduction.
- Examples of matrix equations and their component vectors are shown.
This document describes the steps to convert a TensorFlow model to a TensorRT engine for inference. It includes steps to parse the model, optimize it, generate a runtime engine, serialize and deserialize the engine, as well as perform inference using the engine. It also provides code snippets for a PReLU plugin implementation in C++.
The document discusses machine reading comprehension (MRC) techniques for question answering (QA) systems, comparing search-based and natural language processing (NLP)-based approaches. It covers key milestones in the development of extractive QA models using NLP, from early sentence-level models to current state-of-the-art techniques like cross-attention, self-attention, and transfer learning. It notes the speed and scalability benefits of combining search and reading methods for QA.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.