YARN: a resource manager for analytic platformTsuyoshi OZAWA
The document discusses YARN, a resource manager for Apache Hadoop. It provides an overview of YARN and its key features: (1) managing resources in a cluster, (2) managing application history logs, and (3) a service registry mechanism. It then discusses how distributed processing frameworks like Tez and Spark work on YARN, focusing on their directed acyclic graph (DAG) models and techniques for improving performance on YARN like container reuse.
The document introduces an algorithm called B2ST (Big tree, Big string Suffix Tree construction) for constructing suffix trees of data larger than main memory. B2ST partitions the input string into partitions that fit in memory, sorts suffixes within partition pairs using suffix arrays with LCP information, and merges the results by building a suffix tree from the suffix array streams and order arrays on disk in a single pass without reloading the entire input.
YARN: a resource manager for analytic platformTsuyoshi OZAWA
The document discusses YARN, a resource manager for Apache Hadoop. It provides an overview of YARN and its key features: (1) managing resources in a cluster, (2) managing application history logs, and (3) a service registry mechanism. It then discusses how distributed processing frameworks like Tez and Spark work on YARN, focusing on their directed acyclic graph (DAG) models and techniques for improving performance on YARN like container reuse.
The document introduces an algorithm called B2ST (Big tree, Big string Suffix Tree construction) for constructing suffix trees of data larger than main memory. B2ST partitions the input string into partitions that fit in memory, sorts suffixes within partition pairs using suffix arrays with LCP information, and merges the results by building a suffix tree from the suffix array streams and order arrays on disk in a single pass without reloading the entire input.
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
This document discusses the internals of WalB Driver, which is a data storage driver developed by Cybozu Lab. It records only redo logs, not undo logs, to avoid performance degradation. WalB completes I/O operations by just writing redo logs to log storage, without needing to read current data or generate undo logs. This allows it to overlap and parallelize log flushing and data I/O for efficient write performance.
The document introduces two algorithms for constructing a suffix array: SA-IS and SA-DS. SA-IS uses induced sorting of longest common prefix substrings, while SA-DS uses radix sorting of fixed-length substrings. The document provides pseudocode for the algorithms and explains various terms and data structures used, including longest minimal suffixes, L-type and S-type characters, and buckets for sorting.
An Efficient Backup and Replication of StorageTakashi Hoshino
This document describes WalB, a Linux kernel device driver that provides efficient backup and replication of storage using block-level write-ahead logging (WAL). It has negligible performance overhead and avoids issues like fragmentation. WalB works by wrapping a block device and writing redo logs to a separate log device. It then extracts diffs for backup/replication. The document discusses WalB's architecture, algorithm, performance evaluation and future work.
WalB is a block device driver that uses write-ahead logging (WAL) to provide efficient incremental backups. It aims to address the lack of a good backup solution that works online, with low overhead, across various applications, and using commodity hardware and free software. WalB acts as a wrapper device that logs writes to a separate log device to enable consistent incremental backups of the data device.
The document summarizes VMware vSphere backup operations at Cybozu Labs, including: (1) the vSphere environment containing 78 VMs across 3 ESXi hosts and 4 iSCSI storages, (2) backup software and policy that backs up all VMs weekly retaining past generations, and (3) backup data size and performance, noting that while total provisioned disks are 4.4TB, archives consume only 1TB due to compression and removing zero blocks.
Vmbkp is an online backup tool for VMware vSphere that performs full, differential, and incremental backups of virtual machines. It uses efficient archive formats and sequential I/O to backup virtual disk (VMDK) files. Key features include multi-generation backup management, command-line interface, and support for backup scheduling via Cron. The tool utilizes the VDDK and VI Java libraries to interface with vSphere and perform tasks like snapshots and VMDK access during the backup process.