The document provides an overview of new features in HDFS in Hadoop 2, including:
- A new appendable write pipeline that allows files to be reopened for append and provides primitives like hflush and hsync.
- Support for multiple namenode federation to improve scalability and isolate namespaces.
- Namenode high availability using techniques like ZooKeeper and a quorum journal manager to avoid single points of failure.
- A new file system snapshots feature that allows point-in-time recovery through copy-on-write snapshots without data copying.
The current major release, Hadoop 2.0 offers several significant HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, Snapshots, and performance improvements. We describe how to take advantages of these new features and their benefits. We cover some architectural improvements in detail such as HA, Federation and Snapshots. The second half of the talk describes the current features that are under development for the next HDFS release. This includes much needed data management features such as backup and Disaster Recovery. We add support for different classes of storage devices such as SSDs and open interfaces such as NFS; together these extend HDFS as a more general storage system. Hadoop has recently been extended to run first-class on Windows which expands its enterprise reach and allows integration with the rich tool-set available on Windows. As with every release we will continue improvements to performance, diagnosability and manageability of HDFS. To conclude, we discuss the reliability, the state of HDFS adoption, and some of the misconceptions and myths about HDFS.
You’ve successfully deployed Hadoop, but are you taking advantage of all of Hadoop’s features to operate a stable and effective cluster? In the first part of the talk, we will cover issues that have been seen over the last two years on hundreds of production clusters with detailed breakdown covering the number of occurrences, severity, and root cause. We will cover best practices and many new tools and features in Hadoop added over the last year to help system administrators monitor, diagnose and address such incidents.
The second part of our talk discusses new features for making daily operations easier. This includes features such as ACLs for simplified permission control, snapshots for data protection and more. We will also cover tuning configuration and features that improve cluster utilization, such as short-circuit reads and datanode caching.
Hadoop Operations - Best Practices from the FieldDataWorks Summit
This document discusses best practices for Hadoop operations based on analysis of support cases. Key learnings include using HDFS ACLs and snapshots to prevent accidental data deletion and improve recoverability. HDFS improvements like pausing block deletion and adding diagnostics help address incidents around namespace mismatches and upgrade failures. Proper configuration of hardware, JVM settings, and monitoring is also emphasized.
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
Describes the thinking behind MapR's architecture. MapR"s Hadoop achieves better reliability on commodity hardware compared to anything on the planet, including custom, proprietary hardware from other vendors. Apache HDFS and Cassandra replication is also discussed, as are SAN and NAS storage systems like Netapp and EMC.
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
View the webinar recording here... http://youtu.be/O1pgMMyoJg0
Who: WANdisco CEO, David Richards, and core creaters of Apache Hadoop, Dr. Konstantin Shvachko and Jagane Sundare.
What: WANdisco recently acquired AltoStor, a pioneering firm with deep expertise in the multi-billion dollar Big Data market.
New to the WANdisco team are the Hadoop core creaters, Dr. Konstantin Shvachko and Jagane Sundare. They will cover the the acquisition and reveal how WANdisco's active-active replication technology will change the game of Big Data for the enterprise in 2013.
Hadoop, a proven open source Big Data technolgoy, is the backbone of Yahoo, Facebook, Netflix, Amazon, Ebay and many of the world's largest databases.
When: Tuesday, December 11th at 10am PST (1pm EST).
Why: In this 30-minute webinar you’ll learn:
The staggering, cross-industry growth of Hadoop in the enterprise
How Hadoop's limitations, including HDFS's single-point of failure, are impacting the productivity of the enterprise
How WANdisco's active-active replication technology will alleviate these issues by adding high-availability to Hadoop, taking a fundamentally different approach to Big Data
View the webinar Q&A on the WANdisco blog here...http://blogs.wandisco.com/2012/12/14/answers-to-questions-from-the-webinar-of-dec-11-2012/
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
This document discusses selective data replication with geographically distributed Hadoop. It describes running Hadoop across multiple data centers as a single cluster. A coordination engine ensures consistent metadata replication and a global sequence of updates. Data is replicated asynchronously over the WAN for fast ingestion. Selective data replication allows restricting replication of some data to specific locations for regulations, temporary data, or ingest-only use cases. Heterogeneous storage zones with different performance profiles can also be used for selective placement. This architecture aims to provide a single unified file system view, strict consistency, continuous availability, and geographic scalability across data centers.
The current major release, Hadoop 2.0 offers several significant HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, Snapshots, and performance improvements. We describe how to take advantages of these new features and their benefits. We cover some architectural improvements in detail such as HA, Federation and Snapshots. The second half of the talk describes the current features that are under development for the next HDFS release. This includes much needed data management features such as backup and Disaster Recovery. We add support for different classes of storage devices such as SSDs and open interfaces such as NFS; together these extend HDFS as a more general storage system. Hadoop has recently been extended to run first-class on Windows which expands its enterprise reach and allows integration with the rich tool-set available on Windows. As with every release we will continue improvements to performance, diagnosability and manageability of HDFS. To conclude, we discuss the reliability, the state of HDFS adoption, and some of the misconceptions and myths about HDFS.
You’ve successfully deployed Hadoop, but are you taking advantage of all of Hadoop’s features to operate a stable and effective cluster? In the first part of the talk, we will cover issues that have been seen over the last two years on hundreds of production clusters with detailed breakdown covering the number of occurrences, severity, and root cause. We will cover best practices and many new tools and features in Hadoop added over the last year to help system administrators monitor, diagnose and address such incidents.
The second part of our talk discusses new features for making daily operations easier. This includes features such as ACLs for simplified permission control, snapshots for data protection and more. We will also cover tuning configuration and features that improve cluster utilization, such as short-circuit reads and datanode caching.
Hadoop Operations - Best Practices from the FieldDataWorks Summit
This document discusses best practices for Hadoop operations based on analysis of support cases. Key learnings include using HDFS ACLs and snapshots to prevent accidental data deletion and improve recoverability. HDFS improvements like pausing block deletion and adding diagnostics help address incidents around namespace mismatches and upgrade failures. Proper configuration of hardware, JVM settings, and monitoring is also emphasized.
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
Describes the thinking behind MapR's architecture. MapR"s Hadoop achieves better reliability on commodity hardware compared to anything on the planet, including custom, proprietary hardware from other vendors. Apache HDFS and Cassandra replication is also discussed, as are SAN and NAS storage systems like Netapp and EMC.
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
View the webinar recording here... http://youtu.be/O1pgMMyoJg0
Who: WANdisco CEO, David Richards, and core creaters of Apache Hadoop, Dr. Konstantin Shvachko and Jagane Sundare.
What: WANdisco recently acquired AltoStor, a pioneering firm with deep expertise in the multi-billion dollar Big Data market.
New to the WANdisco team are the Hadoop core creaters, Dr. Konstantin Shvachko and Jagane Sundare. They will cover the the acquisition and reveal how WANdisco's active-active replication technology will change the game of Big Data for the enterprise in 2013.
Hadoop, a proven open source Big Data technolgoy, is the backbone of Yahoo, Facebook, Netflix, Amazon, Ebay and many of the world's largest databases.
When: Tuesday, December 11th at 10am PST (1pm EST).
Why: In this 30-minute webinar you’ll learn:
The staggering, cross-industry growth of Hadoop in the enterprise
How Hadoop's limitations, including HDFS's single-point of failure, are impacting the productivity of the enterprise
How WANdisco's active-active replication technology will alleviate these issues by adding high-availability to Hadoop, taking a fundamentally different approach to Big Data
View the webinar Q&A on the WANdisco blog here...http://blogs.wandisco.com/2012/12/14/answers-to-questions-from-the-webinar-of-dec-11-2012/
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
This document discusses selective data replication with geographically distributed Hadoop. It describes running Hadoop across multiple data centers as a single cluster. A coordination engine ensures consistent metadata replication and a global sequence of updates. Data is replicated asynchronously over the WAN for fast ingestion. Selective data replication allows restricting replication of some data to specific locations for regulations, temporary data, or ingest-only use cases. Heterogeneous storage zones with different performance profiles can also be used for selective placement. This architecture aims to provide a single unified file system view, strict consistency, continuous availability, and geographic scalability across data centers.
Hadoop Distributed File System (HDFS) is a distributed file system that stores large datasets across commodity hardware. It is highly fault tolerant, provides high throughput, and is suitable for applications with large datasets. HDFS uses a master/slave architecture where a NameNode manages the file system namespace and DataNodes store data blocks. The NameNode ensures data replication across DataNodes for reliability. HDFS is optimized for batch processing workloads where computations are moved to nodes storing data blocks.
Hadoop Operations - Best practices from the fieldUwe Printz
Talk about Hadoop Operations and Best Practices for building and maintaining Hadoop cluster.
Talk was held at the data2day conference in Karlsruhe, Germany on 27.11.2014
This document discusses WANdisco's Non-Stop Hadoop solution, which provides continuous availability of Hadoop across local and wide area networks using an active-active replication technique. It addresses key problems with multi-cluster Hadoop deployments like lack of 100% uptime and challenges sharing data globally. The solution utilizes WANdisco's patented distributed coordination engine to achieve consensus across data centers for metadata operations and absolute consistency. Use cases highlighted include eliminating single point of failures, enabling parallel data ingest across locations, optimizing resource utilization through cluster zoning, and achieving near-zero RTO disaster recovery.
Alibaba builds the data infrastructure with Apache Hadoop YARN since 2013, and till now it manages more than 10k nodes. In Alibaba, Hadoop YARN serves various systems such as search, advertising, and recommendation etc. It runs not just batch jobs, also streaming, machine learning, OLAP, and even online services that directly impact Alibaba’s user experience. To extend YARN’s ability to support such complex scenarios, we have done and leveraged a lot of YARN 3.x improvements. In this talk, you will find what are these improvements and how they helped to solve difficult problems in large production clusters.
This includes:
1. Highly improved performance with Capacity Scheduler’s async scheduling framework
2. Better placement decisions with node attributes, placement constraints
3. Better resource utilization with opportunistic containers
4. Introduce a load balancer to balance resource utilization
5. Generic resource types scheduling/isolation to manage new resources such as GPU and FPGA
In the presentation, we will further introduce how we build the entire ecosystem on top of YARN and how we keep evolving YARN’s ability to tackle the challenges brought by continuously increasing data and business in Alibaba.
Speakers
Weiwei Yang, Alibaba, Staff Software Engineer
Ren Chunde, Alibaba Group, Senior Engineer
The document discusses erasure coding as an alternative to replication in distributed storage systems like HDFS. It notes that while replication provides high durability, it has high storage overhead, and erasure coding can provide similar durability with half the storage overhead but slower recovery. The document outlines how major companies like Facebook, Windows Azure Storage, and Google use erasure coding. It then provides details on HDFS-EC, including its architecture, use of hardware acceleration, and performance evaluation showing its benefits over replication.
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
- The document discusses how time series data from sensors can be ingested and analyzed at large scales. It describes how traditional internet architecture concentrates resources at the core while sensors and devices reside at the edge, producing large amounts of time series data.
- It then summarizes techniques for ingesting and analyzing time series data at rates of millions to hundreds of millions of data points per second using technologies like OpenTSDB, HBase, and MapR databases. This involves batching data at the edge and optimized storage designs.
- The document concludes by discussing the advantages of MapR for time series use cases due to its high ingestion rates and integration with query engines like Drill for flexible analysis of large time series datasets
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
WANdisco Fusion is an active-active data replication solution that provides continuous replication of data across Hadoop distributions and storage located in different data centers connected over a LAN/WAN. It addresses key issues for organizations that require continuous data availability, share data across globally deployed Hadoop clusters, and want to ease administrative burdens around disaster recovery. WANdisco Fusion uses a distributed coordination engine and consensus algorithms to maintain a replicated virtual namespace and strict data consistency across locations, allowing multiple data centers to ingest and access data actively while supporting replication to any number of data centers globally.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Microsoft Azure or Amazon S3, and on-premises object stores, such as Western Digital’s ActiveScale. In these settings, applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems for business continuity planning (BCP) and/or supporting hybrid cloud architectures to achieve the required business goals for durability, performance, and coordination.
To resolve this complexity, HDFS-9806 has added a PROVIDED storage tier to mount external storage systems in the HDFS NameNode. Building on this functionality, we can now allow remote namespaces to be synchronized with HDFS, enabling asynchronous writes to the remote storage and the possibility to synchronously and transparently read data back to a local application wanting to access file data which is stored remotely. In this talk, which corresponds to the work in progress under HDFS-12090, we will present how the Hadoop admin can manage storage tiering between clusters and how that is then handled inside HDFS through the snapshotting mechanism and asynchronously satisfying the storage policy.
Speakers
Chris Douglas, Microsoft, Principal Research Software Engineer
Thomas Denmoor, Western Digital, Object Storage Architect
The document describes a distributed Hadoop architecture with multiple data centers and clusters. It shows how to configure Hadoop to access HDFS files across different name nodes and clusters using tools like ViewFileSystem. Client applications can use a single consistent file system namespace and API to access data distributed across the infrastructure.
This document provides a summary of MapReduce improvements in the MapR Hadoop distribution. It discusses how MapR addresses architectural flaws in HDFS through its scalable container-based filesystem. It also describes how MapR improves MapReduce performance through techniques like direct shuffle and express lanes. The document notes how MapR provides high availability for systems like the container location database and JobTracker. It concludes by discussing MapR's support for other frameworks like HBase and Apache Drill beyond traditional MapReduce.
This document discusses HDFS Erasure Coding and its usage at Yahoo Japan. It begins with an overview of erasure coding, how it is implemented in HDFS, and compares it to replication. Test results show the write performance is lower for erasure coding while read performance is similar. Yahoo Japan uses erasure coding for cold weblog data, reducing storage costs by 65% compared to replication. Future plans include supporting additional codecs and features to provide more usability.
The document discusses LLAP (Live Long and Process), a new execution layer for Hive that enables sub-second analytical queries. LLAP uses daemons running on worker nodes to cache data in memory and keep query fragments executing between queries for faster performance. It allows for highly concurrent queries without specialized YARN queues. Benchmarks show LLAP providing up to 90% faster performance over Hive for queries against large datasets. LLAP also aims to serve as a unified data access layer for other systems like Spark SQL.
Difference between hadoop 2 vs hadoop 3Manish Chopra
Hadoop 3.x includes improvements over Hadoop 2.x such as supporting Java 8 as the minimum version, using erasure coding for fault tolerance which reduces storage overhead, improving the YARN timeline service for better scalability and reliability, and moving default ports out of the ephemeral range to prevent startup failures. Hadoop 3.x also adds support for the Microsoft Azure Data Lake filesystem and provides better scalability by allowing clusters to scale to over 10,000 nodes. Key features for resource management, high availability, and running analytics workloads are also continued from Hadoop 2.x in Hadoop 3.x.
This document describes three approaches to implementing a Hadoop disaster recovery solution using Hortonworks Data Platform (HDP). Approach 1 uses a backup NameNode and JobTracker across two data centers. Approach 2 uses two parallel Hadoop clusters across locations with periodic data copying. Approach 3 improves on Approach 2 by leveraging HDP 2.2 features like HDFS snapshots and Apache Falcon to periodically copy snapshots between clusters.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings - but also in more traditional, on premise deployments - applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports (a)synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and active development is ongoing at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
The slides are created for the "Hadoop User Group Vienna", a Meetup that gathers Hadoop users in Vienna on September 6, 2017. The content of the slides correspond to the first talk, which discussed the concepts, terminology and disaster recovery capabilities in the Hadoop ecosystem.
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
The document discusses big data visualization and visual analysis, focusing on the challenges and opportunities. It begins with an overview of visualization and then discusses several challenges in big data visualization, including integrating heterogeneous data from different sources and scales, dealing with data and task complexity, limited interaction capabilities for large data, scalability for both data and users, and the need for domain and development libraries/tools. It then provides examples of visualizing taxi GPS data and traffic patterns in Beijing to identify traffic jams.
Hadoop Distributed File System (HDFS) is a distributed file system that stores large datasets across commodity hardware. It is highly fault tolerant, provides high throughput, and is suitable for applications with large datasets. HDFS uses a master/slave architecture where a NameNode manages the file system namespace and DataNodes store data blocks. The NameNode ensures data replication across DataNodes for reliability. HDFS is optimized for batch processing workloads where computations are moved to nodes storing data blocks.
Hadoop Operations - Best practices from the fieldUwe Printz
Talk about Hadoop Operations and Best Practices for building and maintaining Hadoop cluster.
Talk was held at the data2day conference in Karlsruhe, Germany on 27.11.2014
This document discusses WANdisco's Non-Stop Hadoop solution, which provides continuous availability of Hadoop across local and wide area networks using an active-active replication technique. It addresses key problems with multi-cluster Hadoop deployments like lack of 100% uptime and challenges sharing data globally. The solution utilizes WANdisco's patented distributed coordination engine to achieve consensus across data centers for metadata operations and absolute consistency. Use cases highlighted include eliminating single point of failures, enabling parallel data ingest across locations, optimizing resource utilization through cluster zoning, and achieving near-zero RTO disaster recovery.
Alibaba builds the data infrastructure with Apache Hadoop YARN since 2013, and till now it manages more than 10k nodes. In Alibaba, Hadoop YARN serves various systems such as search, advertising, and recommendation etc. It runs not just batch jobs, also streaming, machine learning, OLAP, and even online services that directly impact Alibaba’s user experience. To extend YARN’s ability to support such complex scenarios, we have done and leveraged a lot of YARN 3.x improvements. In this talk, you will find what are these improvements and how they helped to solve difficult problems in large production clusters.
This includes:
1. Highly improved performance with Capacity Scheduler’s async scheduling framework
2. Better placement decisions with node attributes, placement constraints
3. Better resource utilization with opportunistic containers
4. Introduce a load balancer to balance resource utilization
5. Generic resource types scheduling/isolation to manage new resources such as GPU and FPGA
In the presentation, we will further introduce how we build the entire ecosystem on top of YARN and how we keep evolving YARN’s ability to tackle the challenges brought by continuously increasing data and business in Alibaba.
Speakers
Weiwei Yang, Alibaba, Staff Software Engineer
Ren Chunde, Alibaba Group, Senior Engineer
The document discusses erasure coding as an alternative to replication in distributed storage systems like HDFS. It notes that while replication provides high durability, it has high storage overhead, and erasure coding can provide similar durability with half the storage overhead but slower recovery. The document outlines how major companies like Facebook, Windows Azure Storage, and Google use erasure coding. It then provides details on HDFS-EC, including its architecture, use of hardware acceleration, and performance evaluation showing its benefits over replication.
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
- The document discusses how time series data from sensors can be ingested and analyzed at large scales. It describes how traditional internet architecture concentrates resources at the core while sensors and devices reside at the edge, producing large amounts of time series data.
- It then summarizes techniques for ingesting and analyzing time series data at rates of millions to hundreds of millions of data points per second using technologies like OpenTSDB, HBase, and MapR databases. This involves batching data at the edge and optimized storage designs.
- The document concludes by discussing the advantages of MapR for time series use cases due to its high ingestion rates and integration with query engines like Drill for flexible analysis of large time series datasets
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
WANdisco Fusion is an active-active data replication solution that provides continuous replication of data across Hadoop distributions and storage located in different data centers connected over a LAN/WAN. It addresses key issues for organizations that require continuous data availability, share data across globally deployed Hadoop clusters, and want to ease administrative burdens around disaster recovery. WANdisco Fusion uses a distributed coordination engine and consensus algorithms to maintain a replicated virtual namespace and strict data consistency across locations, allowing multiple data centers to ingest and access data actively while supporting replication to any number of data centers globally.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Microsoft Azure or Amazon S3, and on-premises object stores, such as Western Digital’s ActiveScale. In these settings, applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems for business continuity planning (BCP) and/or supporting hybrid cloud architectures to achieve the required business goals for durability, performance, and coordination.
To resolve this complexity, HDFS-9806 has added a PROVIDED storage tier to mount external storage systems in the HDFS NameNode. Building on this functionality, we can now allow remote namespaces to be synchronized with HDFS, enabling asynchronous writes to the remote storage and the possibility to synchronously and transparently read data back to a local application wanting to access file data which is stored remotely. In this talk, which corresponds to the work in progress under HDFS-12090, we will present how the Hadoop admin can manage storage tiering between clusters and how that is then handled inside HDFS through the snapshotting mechanism and asynchronously satisfying the storage policy.
Speakers
Chris Douglas, Microsoft, Principal Research Software Engineer
Thomas Denmoor, Western Digital, Object Storage Architect
The document describes a distributed Hadoop architecture with multiple data centers and clusters. It shows how to configure Hadoop to access HDFS files across different name nodes and clusters using tools like ViewFileSystem. Client applications can use a single consistent file system namespace and API to access data distributed across the infrastructure.
This document provides a summary of MapReduce improvements in the MapR Hadoop distribution. It discusses how MapR addresses architectural flaws in HDFS through its scalable container-based filesystem. It also describes how MapR improves MapReduce performance through techniques like direct shuffle and express lanes. The document notes how MapR provides high availability for systems like the container location database and JobTracker. It concludes by discussing MapR's support for other frameworks like HBase and Apache Drill beyond traditional MapReduce.
This document discusses HDFS Erasure Coding and its usage at Yahoo Japan. It begins with an overview of erasure coding, how it is implemented in HDFS, and compares it to replication. Test results show the write performance is lower for erasure coding while read performance is similar. Yahoo Japan uses erasure coding for cold weblog data, reducing storage costs by 65% compared to replication. Future plans include supporting additional codecs and features to provide more usability.
The document discusses LLAP (Live Long and Process), a new execution layer for Hive that enables sub-second analytical queries. LLAP uses daemons running on worker nodes to cache data in memory and keep query fragments executing between queries for faster performance. It allows for highly concurrent queries without specialized YARN queues. Benchmarks show LLAP providing up to 90% faster performance over Hive for queries against large datasets. LLAP also aims to serve as a unified data access layer for other systems like Spark SQL.
Difference between hadoop 2 vs hadoop 3Manish Chopra
Hadoop 3.x includes improvements over Hadoop 2.x such as supporting Java 8 as the minimum version, using erasure coding for fault tolerance which reduces storage overhead, improving the YARN timeline service for better scalability and reliability, and moving default ports out of the ephemeral range to prevent startup failures. Hadoop 3.x also adds support for the Microsoft Azure Data Lake filesystem and provides better scalability by allowing clusters to scale to over 10,000 nodes. Key features for resource management, high availability, and running analytics workloads are also continued from Hadoop 2.x in Hadoop 3.x.
This document describes three approaches to implementing a Hadoop disaster recovery solution using Hortonworks Data Platform (HDP). Approach 1 uses a backup NameNode and JobTracker across two data centers. Approach 2 uses two parallel Hadoop clusters across locations with periodic data copying. Approach 3 improves on Approach 2 by leveraging HDP 2.2 features like HDFS snapshots and Apache Falcon to periodically copy snapshots between clusters.
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings - but also in more traditional, on premise deployments - applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports (a)synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and active development is ongoing at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
The slides are created for the "Hadoop User Group Vienna", a Meetup that gathers Hadoop users in Vienna on September 6, 2017. The content of the slides correspond to the first talk, which discussed the concepts, terminology and disaster recovery capabilities in the Hadoop ecosystem.
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
The document discusses big data visualization and visual analysis, focusing on the challenges and opportunities. It begins with an overview of visualization and then discusses several challenges in big data visualization, including integrating heterogeneous data from different sources and scales, dealing with data and task complexity, limited interaction capabilities for large data, scalability for both data and users, and the need for domain and development libraries/tools. It then provides examples of visualizing taxi GPS data and traffic patterns in Beijing to identify traffic jams.
詹剑锋:Big databench—benchmarking big data systemshdhappy001
This document discusses BigDataBench, an open source project for big data benchmarking. BigDataBench includes six real-world data sets and 19 workloads that cover common big data applications and preserve the four V's of big data. The workloads were chosen to represent typical application domains like search engines, social networks, and e-commerce. BigDataBench aims to provide a standardized benchmark for evaluating big data systems, architectures, and software stacks. It has been used in several case studies for workload characterization and performance evaluation of different hardware platforms for big data workloads.
Cloud computing, big data, and mobile are three major trends that will change the world. Cloud computing provides scalable and elastic IT resources as services over the internet. Big data involves large amounts of both structured and unstructured data that can generate business insights when analyzed. The hadoop ecosystem, including components like HDFS, mapreduce, pig, and hive, provides an architecture for distributed storage and processing of big data across commodity hardware.
This document provides an overview of Capital One's plans to introduce Hadoop and discusses several proof of concepts (POCs) that could be developed. It summarizes the history and practices of using Hadoop at other companies like LinkedIn, Netflix, and Yahoo. It then outlines possible POCs for Hadoop distributions, ETL/analytics frameworks, performance testing, and developing a scaling layer. The goal is to contribute open source code and help with Capital One's transition to using Hadoop in production.
Cloud computing, big data, and mobile technologies are driving major changes in the IT world. Cloud computing provides scalable computing resources over the internet. Big data involves extremely large data sets that are analyzed to reveal business insights. Hadoop is an open-source software framework that allows distributed processing of big data across commodity hardware. It includes tools like HDFS for storage and MapReduce for distributed computing. The Hadoop ecosystem also includes additional tools for tasks like data integration, analytics, workflow management, and more. These emerging technologies are changing how businesses use and analyze data.
Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It consists of Hadoop Distributed File System (HDFS) for storage, and MapReduce for distributed processing. HDFS stores large files across multiple machines, with automatic replication of data for fault tolerance. It has a master/slave architecture with a NameNode managing the file system namespace and DataNodes storing file data blocks.
Interactive Hadoop via Flash and MemoryChris Nauroth
Enterprises are using Hadoop for interactive real-time data processing via projects such as the Stinger Initiative. We describe two new HDFS features – Centralized Cache Management and Heterogeneous Storage – that allow applications to effectively use low latency storage media such as Solid State Disks and RAM. In the first part of this talk, we discuss Centralized Cache Management to coordinate caching important datasets and place tasks for memory locality. HDFS deployments today rely on the OS buffer cache to keep data in RAM for faster access. However, the user has no direct control over what data is held in RAM or how long it?s going to stay there. Centralized Cache Management allows users to specify which data to lock into RAM. Next, we describe Heterogeneous Storage support for applications to choose storage media based on their performance and durability requirements. Perhaps the most interesting of the newer storage media are Solid State Drives which provide improved random IO performance over spinning disks. We also discuss memory as a storage tier which can be useful for temporary files and intermediate data for latency sensitive real-time applications. In the last part of the talk we describe how administrators can use quota mechanism extensions to manage fair distribution of scarce storage resources across users and applications.
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
This document discusses best practices for implementing Ceph-powered storage as a service. It covers planning a Ceph implementation based on business and technical requirements. Various use cases for Ceph are described, including OpenStack, cloud storage, web-scale applications, high performance block storage, archive/cold storage, databases and Hadoop. Architectural considerations for redundancy, servers, networking are also discussed. The document concludes with a case study of a university implementing a Ceph-based storage cloud to address storage needs for cancer and genomic research data.
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you!
In this two part session you learn details of:
• the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas.
• best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About? By: Kamesh Pemmaraju,Neil Levine
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you! In this two part session you learn details of: • the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas. • best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.
Speaker: Sanjay Radia, Founder and Chief Architect, Hortonworks
Are you taking advantage of all of Hadoop’s features to operate a stable and effective cluster? Inspired by real-world support cases, this talk discusses best practices and new features to help improve incident response and daily operations. Chances are that you’ll walk away from this talk with some new ideas to implement in your own clusters.
Apache Hadoop 3.0 is coming! As the next major release, it attracts everyone's attention as show case several bleeding-edge technologies and significant features across all components of Apache Hadoop, include: Erasure Coding in HDFS, Multiple Standby NameNodes, YARN Timeline Service v2, JNI-based shuffle in MapReduce, Apache Slider integration and Service Support as First Class Citizen, Hadoop library updates and client-side class path isolation, etc.
In this talk, we will update the status of Hadoop 3 especially the releasing work in community and then go deep diving on new features included in Hadoop 3.0. As a new major release, Hadoop 3 would also include some incompatible changes - we will go through most of these changes and explore its impact to existing Hadoop users and operators. In the last part of this session, we will continue to discuss ongoing efforts in Hadoop 3 age and show the big picture that how big data landscape could be largely influenced by Hadoop 3.
Hadoop 3.0 will include major new features like HDFS erasure coding for improved storage efficiency and YARN support for long running services and Docker containers to improve resource utilization. However, it will maintain backwards compatibility and a focus on testing given the importance of compatibility for existing Hadoop users. The release is targeted for late 2017 after several alpha and beta stages.
Apache Hadoop 3 is coming! As the next major milestone for hadoop and big data, it attracts everyone's attention as showcase several bleeding-edge technologies and significant features across all components of Apache Hadoop: Erasure Coding in HDFS, Docker container support, Apache Slider integration and Native service support, Application Timeline Service version 2, Hadoop library updates and client-side class path isolation, etc. In this talk, first we will update the status of Hadoop 3.0 releasing work in apache community and the feasible path through alpha, beta towards GA. Then we will go deep diving on each new feature, include: development progress and maturity status in Hadoop 3. Last but not the least, as a new major release, Hadoop 3.0 will contain some incompatible API or CLI changes which could be challengeable for downstream projects and existing Hadoop users for upgrade - we will go through these major changes and explore its impact to other projects and users.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
The document provides an overview of Hadoop, including:
- A brief history of Hadoop and its origins from Google and Apache projects
- An explanation of Hadoop's architecture including HDFS, MapReduce, JobTracker, TaskTracker, and DataNodes
- Examples of how large companies like Yahoo, Facebook, and Amazon use Hadoop for applications like log processing, searches, and advertisement targeting
The document provides an overview of Hadoop, including:
- A brief history of Hadoop and its origins at Google and Yahoo
- An explanation of Hadoop's architecture including HDFS, MapReduce, JobTracker, TaskTracker, and DataNodes
- Examples of how large companies like Facebook and Amazon use Hadoop to process massive amounts of data
This document summarizes the roles of servers in a Hadoop cluster, including manager, name nodes, edge nodes, and data nodes. It discusses hardware considerations for Hadoop cluster design like CPU to memory to disk ratios for different use cases. It also provides an overview of Dell's Hadoop solutions that integrate PowerEdge servers, Dell Networking switches, and support from Etu for analytic software and Dell Professional Services for implementation. It briefly discusses futures around in-memory processing and virtualized Hadoop deployments.
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
This document discusses optimizing Dell PowerEdge server configurations for Hadoop deployments. It recommends tested server configurations like the PowerEdge R720 and R720XD that provide balanced compute and storage. It also recommends a reference architecture using these servers along with networking best practices and validated software configurations from Cloudera to provide a proven, optimized big data platform.
The document discusses improvements to HDFS that allow it to leverage memory as a storage medium. Key points include:
- HDFS 2.3 introduced memory as a storage medium, with RAM disks providing persistence across restarts.
- HDFS 2.6 introduced storage policies that allow applications to target different storage media like SSD or memory.
- The Centralized Cache Management feature loads hot data into memory pools to enable zero-copy reads.
- The Lazy Persist Writes feature allows applications to write to memory and have HDFS asynchronously write to persistent storage, reducing latency.
- Future work includes improving caching, short-circuit writes, and the Memfs layered file system to provide more flexible
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
Динамичное развитие инструментов для обработки Больших Данных порождает новые подходы к повышению производительности. Ключевые новые технологии в Hadoop 2.0, такие как Yarn labeling и Storage Tiering, уже используются компаниями Yahoo и Ebay. Эти новые технологии открывают путь для серьезного повышения эффективности ИТ-инфраструктуры для Hadoop, достигая прироста производительности в несколько десятков процентов при одновременном снижении потребления памяти и электроэнергии.
Эталонная архитектура для Hadoop от HP — HP Big Data Reference Architecture — предлагает использование специализированных "микросерверов" HP Moonshot вкупе с высокоплотными узлами хранения HP Apollo для достижения лучших на сегодня показателей полезной отдачи от железа в Hadoop.
- HDFS Federation allows Hadoop to scale beyond the limitations of a single namespace by splitting the namespace across multiple independent namenodes. Each namenode manages its own namespace volume consisting of a namespace and block pool.
- A client-side mount table provides a virtual unified namespace by mapping namespace volumes to namenodes, hiding the federation details from users and applications.
- HDFS Federation provides wire compatibility by requiring clients to use the same version of Hadoop as the servers, and supports existing HDFS functionality like append, sticky bits, and new APIs like FileContext.
We Provide Hadoop training institute in Hyderabad and Bangalore with corporate training by 12+ Experience faculty.
Real-time industry experts from MNCs
Resume Preparation by expert Professionals
Lab exercises
Interview Preparation
Experts advice
Similar to Nicholas:hdfs what is new in hadoop 2 (20)
Spark is an open source cluster computing framework originally developed at UC Berkeley. Intel has made many contributions to Spark's development through code commits, patches, and collaborating with the Spark community. Spark is widely used by companies like Alibaba, Baidu, and Youku for large-scale data analytics and machine learning tasks. It allows for faster iterative jobs than Hadoop through its in-memory computing model and supports multiple workloads including streaming, SQL, and graph processing.
This document describes an interactive batch query system for game analytics based on Apache Drill. It addresses the problem of answering common ad-hoc queries over large volumes of log data by using a columnar data model and optimizing query plans. The system utilizes Drill's schema-free data model and vectorized query processing. It further improves performance by merging similar queries, reusing intermediate results, and pushing execution downwards to utilize multi-core CPUs. This provides a unified solution for both ad-hoc and scheduled batch analytics workloads at large scale.
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
This document summarizes a presentation about running Cloudera Impala on PostgreSQL to enable SQL queries on large datasets. Key points:
- The company processes 3 billion daily ad impressions and 20TB of daily report data, requiring a scalable SQL solution.
- Impala was chosen for its fast performance from in-memory processing and code generation. The architecture runs Impala coordinators and executors across clusters.
- The author hacked Impala to also scan data from PostgreSQL for mixed workloads. This involved adding new scan node types and metadata.
- Tests on a 150 million row dataset showed Impala with PostgreSQL achieving 20 million rows scanned per second per core.
This document discusses big data in the cloud and provides an overview of YARN. It begins with introducing the speaker and their experience with VMware and Apache Hadoop. The rest of the document covers: 1) trends in big data like the rise of YARN, faster query engines, and focus on enterprise capabilities, 2) how YARN addresses limitations of MapReduce by splitting responsibilities, 3) how YARN serves as a hub for various big data applications, and 4) how YARN can integrate with cloud infrastructure for elastic resource management between the two frameworks. The document advocates for open source contribution to help advance big data technologies.
Raghu nambiar:industry standard benchmarkshdhappy001
Industry standard benchmarks have played a crucial role in advancing the computing industry by enabling healthy competition that drives product improvements and new technologies. Major benchmarking organizations like TPC, SPEC, and SPC have developed numerous benchmarks over time to keep up with industry needs. Looking ahead, new benchmarks are needed to address emerging technologies like cloud, big data, and the internet of things. International conferences and workshops bring together experts to collaborate on developing these new, relevant benchmarks.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16