The document provides information on how snapshots work in Open-E software. Snapshots allow creating an exact copy of a logical volume at a point in time, while the original data continues to be available. The snapshot is implemented using copy-on-write, where changed blocks are copied to reserved space before being overwritten. This allows mounting snapshots read-only to access past versions of data. The document discusses snapshot configuration, advantages like non-disruptive backups, and disadvantages like decreased write speeds with many active snapshots.
The document compares on-heap and off-heap caching options. It discusses heap memory usage in the JVM and alternatives like off-heap memory using memory mapped files, ByteBuffers, and Unsafe. Popular off-heap caches like Chronicle, Hazelcast, and Redis are presented along with comparisons of their features, performance, and garbage collection impact. The document aims to help developers choose the most suitable cache for their application needs.
This document proposes a method for monitoring and profiling Hadoop using AspectJ. It describes logging executed instructions at runtime to create traces, and counting the frequency of instructions to generate profiles at different levels (node, process, method). Experimental results show the monitoring overhead is small, increasing processing time by only a few percent. Visualized profiling results can help developers understand system behavior and identify potential issues like workload imbalances or speculative execution opportunities. The goal is to provide effective runtime information for development and help understand Hadoop system behaviors and specifications.
- The document describes progress on a swap-aware JVM garbage collection policy.
- An initial implementation was summarized, and issues with allowing free space between live objects were identified.
- A page-level reimplementation is underway, informed by related work. Validation tests using Deeplearning4Java and Spark workloads are planned.
- Future work includes optimizing the GC policy, adding journaling to the Lustre file system, and additional validation experiments.
This document describes Onyx, a new flexible and extensible data processing system. Onyx aims to address limitations in existing frameworks when dealing with new resource environments like disaggregated computing and transient resources. The Onyx architecture includes a compiler that transforms dataflow programs into optimized execution plans using various passes. The runtime then executes the plans across cluster resources. Onyx allows dynamic optimization by collecting metrics during execution and generating new plans. It can harness transient resources by placing tasks strategically.
The document summarizes benchmark tests comparing Fengqi.Asia's SmartMachine and VirtualMachines to popular cloud platforms like AWS EC2 and GrandCloud. In Part A, Fengqi.Asia's SmartMachine outperformed alternatives on all tests, with disk I/O speeds over 500% faster and CPU performance over 50% better. Part B found the VirtualMachines also outperformed competitors on memory and disk I/O speeds by over 30%, though CPU performance was comparable. The tests demonstrate Fengqi.Asia's superior performance for high-load web applications.
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
- OpenHFT provides solutions for improving Java data locality and inter-process communication (IPC) transport, enabling ultra-low latency real-time Java deployments.
- It includes Chronicle Map, an off-heap concurrent map that avoids garbage collection pauses compared to on-heap maps. It also provides faster IPC than UDP/TCP via shared memory.
- Tests show Chronicle Map accessed via shared memory IPC can be over 1000x faster than Red Hat Infinispan accessed via UDP for a distributed cache workload.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Pavan Balaji from Argonne presents an overview of system interconnects for HPC.
Watch the video: https://wp.me/p3RLHQ-hA4
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Internal Workshop on Open Source Software Community Project: Smart Grid Open Source Platform,
Supported by NIPA, Korea
- 2010/06/30
more info on @ http://nephee.or.kr/
The document compares on-heap and off-heap caching options. It discusses heap memory usage in the JVM and alternatives like off-heap memory using memory mapped files, ByteBuffers, and Unsafe. Popular off-heap caches like Chronicle, Hazelcast, and Redis are presented along with comparisons of their features, performance, and garbage collection impact. The document aims to help developers choose the most suitable cache for their application needs.
This document proposes a method for monitoring and profiling Hadoop using AspectJ. It describes logging executed instructions at runtime to create traces, and counting the frequency of instructions to generate profiles at different levels (node, process, method). Experimental results show the monitoring overhead is small, increasing processing time by only a few percent. Visualized profiling results can help developers understand system behavior and identify potential issues like workload imbalances or speculative execution opportunities. The goal is to provide effective runtime information for development and help understand Hadoop system behaviors and specifications.
- The document describes progress on a swap-aware JVM garbage collection policy.
- An initial implementation was summarized, and issues with allowing free space between live objects were identified.
- A page-level reimplementation is underway, informed by related work. Validation tests using Deeplearning4Java and Spark workloads are planned.
- Future work includes optimizing the GC policy, adding journaling to the Lustre file system, and additional validation experiments.
This document describes Onyx, a new flexible and extensible data processing system. Onyx aims to address limitations in existing frameworks when dealing with new resource environments like disaggregated computing and transient resources. The Onyx architecture includes a compiler that transforms dataflow programs into optimized execution plans using various passes. The runtime then executes the plans across cluster resources. Onyx allows dynamic optimization by collecting metrics during execution and generating new plans. It can harness transient resources by placing tasks strategically.
The document summarizes benchmark tests comparing Fengqi.Asia's SmartMachine and VirtualMachines to popular cloud platforms like AWS EC2 and GrandCloud. In Part A, Fengqi.Asia's SmartMachine outperformed alternatives on all tests, with disk I/O speeds over 500% faster and CPU performance over 50% better. Part B found the VirtualMachines also outperformed competitors on memory and disk I/O speeds by over 30%, though CPU performance was comparable. The tests demonstrate Fengqi.Asia's superior performance for high-load web applications.
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
- OpenHFT provides solutions for improving Java data locality and inter-process communication (IPC) transport, enabling ultra-low latency real-time Java deployments.
- It includes Chronicle Map, an off-heap concurrent map that avoids garbage collection pauses compared to on-heap maps. It also provides faster IPC than UDP/TCP via shared memory.
- Tests show Chronicle Map accessed via shared memory IPC can be over 1000x faster than Red Hat Infinispan accessed via UDP for a distributed cache workload.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Pavan Balaji from Argonne presents an overview of system interconnects for HPC.
Watch the video: https://wp.me/p3RLHQ-hA4
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Internal Workshop on Open Source Software Community Project: Smart Grid Open Source Platform,
Supported by NIPA, Korea
- 2010/06/30
more info on @ http://nephee.or.kr/
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
This document discusses using Linux perf_events (perf) profiling tools to analyze Java performance on Linux. It describes how perf can provide complete visibility into Java, JVM, GC and system code but that Java profilers have limitations. It presents the solution of using perf to collect mixed-mode flame graphs that include Java method names and symbols. It also discusses fixing issues with broken Java stacks and missing symbols on x86 architectures in perf profiles.
The document discusses Emergent Game Technologies' Floodgate cross-platform stream processing library. It describes Floodgate as a foundation for easing multi-core development across platforms like PC, Xbox, PS3 and Wii. It outlines how Floodgate uses a stream processing model to partition work into tasks that can run concurrently, improving performance by taking advantage of multiple cores. Examples are given showing how tasks like skinning and morphing benefit from being offloaded to Floodgate.
This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
Storm is a distributed, fault-tolerant system for processing streams of data. It delegates work to components like spouts and bolts. Spouts act as sources of input streams, passing data to bolts which transform the data and either persist it or pass it to other bolts. Storm clusters have Nimbus, Zookeeper, and Supervisor nodes, with Nimbus distributing code and assigning work, Zookeeper coordinating the cluster, and Supervisors running worker processes.
The document discusses PROSE (Partitioned Reliable Operating System Environment), an approach that runs applications in specialized kernel partitions for finer control over system resources and improved reliability. It aims to simplify development of specialized kernels and enable resource sharing across partitions. The approach is evaluated using IBM's research hypervisor rHype, which shows PROSE can reduce noise and provide more deterministic performance than Linux. Future work focuses on running larger commercial workloads and further performance/noise experiments.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Process' Virtual Address Space in GNU/LinuxVarun Mahajan
The document discusses the virtual address space of a process in GNU/Linux. It explains that a process has both a user space and kernel space in virtual memory. The process' virtual address space contains text, data, and shared library segments. Functions like brk, sbrk, mmap, malloc, and free are used to allocate and free memory in the data segment to grow the process heap.
- The document summarizes the networks at CERN, including the IT-CS group which manages communication services, the extensive IP network connecting equipment and facilities, and the networks supporting the Large Hadron Collider experiments. It describes the large data flows and computing challenges of the LHC experiments and the worldwide computing grid (WLCG) established to support this. It focuses on the LHCOPN connecting CERN and the Tier1 centers and the new LHCONE being developed to better support the changing computing models of the experiments.
The document provides an overview of how to read and understand garbage collection (GC) log lines from different Java vendors and JVM versions. It begins by explaining the parts of a basic GC log line for the OpenJDK GC log format. It then discusses GC log lines for G1 GC and CMS GC in more detail. Finally, it shares examples of GC log formats from IBM JVMs and different levels of information provided. The document aims to help readers learn to correctly interpret GC logs and analyze GC behavior.
The column-oriented data structure of PG-Strom stores data in separate column storage (CS) tables based on the column type, with indexes to enable efficient lookups. This reduces data transfer compared to row-oriented storage and improves GPU parallelism by processing columns together.
This document summarizes a project report on implementing AES encryption in parallel. It describes how AES works sequentially and the approach taken to parallelize it by assigning each processing element a portion of the data to encrypt in parallel. Experimental results show speedups from parallelization and analysis of running times for different file sizes and numbers of processing elements. Future work is proposed to make the program more space efficient and properly recover the ciphertext.
1. The Trans-Pacific Grid Datafarm testbed provides 70 terabytes of disk capacity and 13 gigabytes per second of disk I/O performance across clusters in Japan, the US, and Thailand.
2. Using the GNET-1 network testbed device, the Trans-Pacific Grid Datafarm achieved stable transfer rates of up to 3.79 gigabits per second during a file replication experiment between Japan and the US, near the theoretical maximum of 3.9 gigabits per second.
3. Precise pacing of network traffic flows using inter-frame gap controls on the GNET-1 device allowed for high-speed, lossless utilization of long-haul trans-Pacific network links.
Trip down the GPU lane with Machine LearningRenaldas Zioma
What Machine Learning professional should know about GPU!
Brief outline of the deck:
* GPU architecture explained with simple images
* memory bandwidth cheat-sheats for common hardware configuration,
* overview of GPU programming model
* under the hood peek at the main building block of ML - matrix multiplication
* effect of mini-batch size on performance
Originally I gave this talk at the internal Machine Learning Workshop in Unity Seattle
HIGH QUALITY pdf slides: http://bit.ly/2iQxm7X (on Dropbox)
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
This document provides an overview of garbage collection in Java. It begins with an introduction to the presenter Leon Chen and his background. It then discusses Java memory management and garbage collection fundamentals, including the young and old generations, minor and major garbage collections, and how objects are promoted between generations. The document provides examples of garbage collection using diagrams and discusses tuning the Java heap size based on the live data size. It emphasizes the importance of garbage collection logging for performance analysis.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Masahito Ohue
Masahito Ohue, Marina Yamasawa, Kazuki Izawa, Yutaka Akiyama: Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU and MEGAN,
In Proceedings of the 19th annual IEEE International Conference on Bioinformatics and Bioengineering (IEEE BIBE 2019), 152-156, 2019. doi: 10.1109/BIBE.2019.00035
Jzab is a standalone Java library that implements Zab, the atomic broadcast protocol used by ZooKeeper, allowing other applications to easily use Zab. It includes features like authentication, dynamic reconfiguration, snapshots, and leader election. Benchmark tests showed that batching transactions led to higher throughput, and different garbage collectors and snapshot strategies affected performance. Jzab passed testing with the Jepsen framework and provides three states (Recovering, Leading, Following) from the user's perspective.
This report compares the block storage performance of AWS, Digital Ocean, OVH, DreamHost, and several StorPool-based cloud offerings. A variety of benchmarks were used, including PGBENCH, Sysbench, fio, and rsync. The results showed extreme differences in performance between the block storage offerings, with StorPool-based clouds significantly outperforming the other providers in most tests, especially for latency-sensitive workloads. Further tests showed that IOPS limits can control throughput but not latency, and that a lower-priced AWS volume with a 3,000 IOPS limit outperformed a Digital Ocean volume with a higher 7,500-10,000 IOPS limit.
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
This document discusses using Linux perf_events (perf) profiling tools to analyze Java performance on Linux. It describes how perf can provide complete visibility into Java, JVM, GC and system code but that Java profilers have limitations. It presents the solution of using perf to collect mixed-mode flame graphs that include Java method names and symbols. It also discusses fixing issues with broken Java stacks and missing symbols on x86 architectures in perf profiles.
The document discusses Emergent Game Technologies' Floodgate cross-platform stream processing library. It describes Floodgate as a foundation for easing multi-core development across platforms like PC, Xbox, PS3 and Wii. It outlines how Floodgate uses a stream processing model to partition work into tasks that can run concurrently, improving performance by taking advantage of multiple cores. Examples are given showing how tasks like skinning and morphing benefit from being offloaded to Floodgate.
This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
Storm is a distributed, fault-tolerant system for processing streams of data. It delegates work to components like spouts and bolts. Spouts act as sources of input streams, passing data to bolts which transform the data and either persist it or pass it to other bolts. Storm clusters have Nimbus, Zookeeper, and Supervisor nodes, with Nimbus distributing code and assigning work, Zookeeper coordinating the cluster, and Supervisors running worker processes.
The document discusses PROSE (Partitioned Reliable Operating System Environment), an approach that runs applications in specialized kernel partitions for finer control over system resources and improved reliability. It aims to simplify development of specialized kernels and enable resource sharing across partitions. The approach is evaluated using IBM's research hypervisor rHype, which shows PROSE can reduce noise and provide more deterministic performance than Linux. Future work focuses on running larger commercial workloads and further performance/noise experiments.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Process' Virtual Address Space in GNU/LinuxVarun Mahajan
The document discusses the virtual address space of a process in GNU/Linux. It explains that a process has both a user space and kernel space in virtual memory. The process' virtual address space contains text, data, and shared library segments. Functions like brk, sbrk, mmap, malloc, and free are used to allocate and free memory in the data segment to grow the process heap.
- The document summarizes the networks at CERN, including the IT-CS group which manages communication services, the extensive IP network connecting equipment and facilities, and the networks supporting the Large Hadron Collider experiments. It describes the large data flows and computing challenges of the LHC experiments and the worldwide computing grid (WLCG) established to support this. It focuses on the LHCOPN connecting CERN and the Tier1 centers and the new LHCONE being developed to better support the changing computing models of the experiments.
The document provides an overview of how to read and understand garbage collection (GC) log lines from different Java vendors and JVM versions. It begins by explaining the parts of a basic GC log line for the OpenJDK GC log format. It then discusses GC log lines for G1 GC and CMS GC in more detail. Finally, it shares examples of GC log formats from IBM JVMs and different levels of information provided. The document aims to help readers learn to correctly interpret GC logs and analyze GC behavior.
The column-oriented data structure of PG-Strom stores data in separate column storage (CS) tables based on the column type, with indexes to enable efficient lookups. This reduces data transfer compared to row-oriented storage and improves GPU parallelism by processing columns together.
This document summarizes a project report on implementing AES encryption in parallel. It describes how AES works sequentially and the approach taken to parallelize it by assigning each processing element a portion of the data to encrypt in parallel. Experimental results show speedups from parallelization and analysis of running times for different file sizes and numbers of processing elements. Future work is proposed to make the program more space efficient and properly recover the ciphertext.
1. The Trans-Pacific Grid Datafarm testbed provides 70 terabytes of disk capacity and 13 gigabytes per second of disk I/O performance across clusters in Japan, the US, and Thailand.
2. Using the GNET-1 network testbed device, the Trans-Pacific Grid Datafarm achieved stable transfer rates of up to 3.79 gigabits per second during a file replication experiment between Japan and the US, near the theoretical maximum of 3.9 gigabits per second.
3. Precise pacing of network traffic flows using inter-frame gap controls on the GNET-1 device allowed for high-speed, lossless utilization of long-haul trans-Pacific network links.
Trip down the GPU lane with Machine LearningRenaldas Zioma
What Machine Learning professional should know about GPU!
Brief outline of the deck:
* GPU architecture explained with simple images
* memory bandwidth cheat-sheats for common hardware configuration,
* overview of GPU programming model
* under the hood peek at the main building block of ML - matrix multiplication
* effect of mini-batch size on performance
Originally I gave this talk at the internal Machine Learning Workshop in Unity Seattle
HIGH QUALITY pdf slides: http://bit.ly/2iQxm7X (on Dropbox)
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
This document provides an overview of garbage collection in Java. It begins with an introduction to the presenter Leon Chen and his background. It then discusses Java memory management and garbage collection fundamentals, including the young and old generations, minor and major garbage collections, and how objects are promoted between generations. The document provides examples of garbage collection using diagrams and discusses tuning the Java heap size based on the live data size. It emphasizes the importance of garbage collection logging for performance analysis.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Masahito Ohue
Masahito Ohue, Marina Yamasawa, Kazuki Izawa, Yutaka Akiyama: Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU and MEGAN,
In Proceedings of the 19th annual IEEE International Conference on Bioinformatics and Bioengineering (IEEE BIBE 2019), 152-156, 2019. doi: 10.1109/BIBE.2019.00035
Jzab is a standalone Java library that implements Zab, the atomic broadcast protocol used by ZooKeeper, allowing other applications to easily use Zab. It includes features like authentication, dynamic reconfiguration, snapshots, and leader election. Benchmark tests showed that batching transactions led to higher throughput, and different garbage collectors and snapshot strategies affected performance. Jzab passed testing with the Jepsen framework and provides three states (Recovering, Leading, Following) from the user's perspective.
This report compares the block storage performance of AWS, Digital Ocean, OVH, DreamHost, and several StorPool-based cloud offerings. A variety of benchmarks were used, including PGBENCH, Sysbench, fio, and rsync. The results showed extreme differences in performance between the block storage offerings, with StorPool-based clouds significantly outperforming the other providers in most tests, especially for latency-sensitive workloads. Further tests showed that IOPS limits can control throughput but not latency, and that a lower-priced AWS volume with a 3,000 IOPS limit outperformed a Digital Ocean volume with a higher 7,500-10,000 IOPS limit.
This white paper compares the performance of Fibre Channel, Hardware iSCSI, Software iSCSI, and NFS storage protocols in VMware vSphere 4. Experiments show that all four protocols can achieve maximum throughput limited only by network bandwidth. However, Fibre Channel and Hardware iSCSI have substantially lower CPU costs than Software iSCSI and NFS. Tests with multiple VMs also demonstrate that vSphere 4 maintains high performance levels with greater efficiency than previous versions.
Built-in replication in PostgreSQL 9.0 allows a master database to stream transaction log changes asynchronously to one or more standby databases. This provides high availability and allows read-only queries on standbys. Replication is at the entire database level and supports all SQL supported in PostgreSQL. However, it does not provide query distribution or per-table granularity.
This document provides an overview of Logfs, a log-structured flash filesystem for Linux. Logfs treats storage as a circular log, writing sequentially. It uses a journal and garbage collection to reclaim space. Logfs divides storage into segments to reduce garbage collection overhead. It stores inodes in an inode file and all other data in an object store. Inodes and blocks are allocated into different segments based on their level.
The document describes a virtual device project that provides fast backup and replication of data using HDD RAID arrays. It logs I/O information synchronously to provide reliability even if the primary storage crashes. This asynchronous replication ensures the replica is always up to date without impacting the primary storage's performance. The system uses tag markers and a control file to backup data incrementally and support disaster recovery if needed.
Taming Go's Memory Usage — and Avoiding a Rust RewriteScyllaDB
Last summer, my team and I faced a question many young startups face. Should we rewrite our system in Rust?
At the time of the decision, we were primarily writing in Go. I was working on an agent that passively watches network traffic, parses API calls, and sends obfuscated summaries back to our service for analysis. As users were starting to run more traffic through us, memory usage by the agent grew to an unacceptably high level, impacting performance.
This led me to spend 25 days in despair and immerse myself in the details of Go’s memory management, our technology stack, and the profiling tools available – trying to get our memory footprint back under control. Go’s fully automatic memory management makes this no easy feat.
Spoiler: I emerged victorious and our team still uses Go. In this talk, I’ll talk about key steps and lessons learned from my project. I intend this talk to be helpful for people curious about reducing their memory footprint in Go, or anybody wondering about the tradeoffs of switching to or from Go.
The document provides an overview of the Oracle database architecture including its major components, memory structures, background processes, logical and physical storage structures, and Automatic Storage Management (ASM) storage components. Specifically, it discusses the System Global Area (SGA) and Program Global Area (PGA), background processes like the database writer (DBWn) and log writer (LGWR), tablespaces and data files, and how ASM manages Oracle database files. The objectives are to explain these various architectural elements that make up the Oracle database.
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
Apache Hadoop is clearly one of the fastest growing big data platforms to store and analyze arbitrarily structured data in search of business insights. However, applicable commodity infrastructures have advanced greatly in the last number of years and there is not a lot of accurate, current information to assist the community in optimally designing and configuring
Hadoop platforms (Infrastructure and O/S). In this talk we`ll present guidance on Linux and Infrastructure deployment, configuration and optimization from both Red Hat and HP (derived from actual performance data) for clusters optimized for single workloads or balanced clusters that host multiple concurrent workloads.
The document discusses various aspects of keeping MongoDB data safe, including replication, backups, and disaster recovery. It describes how MongoDB supports automatic replication across data centers for failover and horizontal scaling. It also discusses using MongoDB tools like mongodump and mongoexport for backups, as well as taking filesystem snapshots. For disaster recovery, it recommends repairing hardware, replaying oplogs if possible, or initializing from backups otherwise.
This document summarizes the results of profiling a Hadoop cluster to analyze infrastructure needs. Key findings include:
- The I/O and network subsystems were underutilized, with I/O at 10% capacity and network below 25% capacity.
- CPU utilization was high with low I/O wait, indicating optimal usage. Memory usage showed a high percentage of caching to reduce I/O waits.
- Initial HBase testing showed CPU utilization could not be fully driven by two workloads. Performance was good when data fit in cache but dropped when exceeding cache size.
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
Optimizing MapReduce job performance is often seen as something of a black art. In order to maximize performance, developers need to understand the inner workings of the MapReduce execution framework and how they are affected by various configuration parameters and MR design patterns. The talk will illustrate the underlying mechanics of job and task execution, including the map side sort/spill, the shuffle, and the reduce side merge, and then explain how different job configuration parameters and job design strategies affect the performance of these operations. Though the talk will cover internals, it will also provide practical tips, guidelines, and rules of thumb for better job performance. The talk is primarily targeted towards developers directly using the MapReduce API, though will also include some tips for users of higher level frameworks.
Efficient logging in multithreaded C++ serverShuo Chen
This document discusses efficient logging in multithreaded C++ servers. It describes the muduo logging library which can log over 1 million messages per second with low latency. The key aspects are an efficient LogStream frontend, asynchronous backend using double buffering to pass log messages from threads to a log writer thread without blocking, and writing to local files for performance and reliability.
The document discusses memory management in Teradata systems. It explains that memory is partitioned between operating system (OS) managed memory and File Segment Group (FSG) cache. The FSG cache percentage can be adjusted to control how much memory is allocated to each. Other topics covered include monitoring memory usage, tuning the FSG cache threshold, understanding hash join memory usage, and sizing redistribution buffers. The presentation provides guidance on diagnosing and addressing memory depletion issues through tools like adjusting configuration parameters and disabling memory intensive features if needed.
Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01.
Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system.
In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.
Yfrog uses HBase as its scalable database backend to store and serve 250 million photos from over 60 million monthly users across 4 HBase clusters ranging from 50TB to 1PB in size. The authors provide best practices for configuring and monitoring HBase, including using smaller commodity servers, tuning JVM garbage collection, monitoring metrics like thread usage and disk I/O, and implementing caching and replication for high performance and reliability. Following these practices has allowed Yfrog's HBase deployment to run smoothly and efficiently.
The document describes COSBench, a benchmark tool for evaluating the performance of cloud object storage services. It provides an overview of COSBench's key components, including its configurable workload definition file, controller for managing tests, and drivers for generating load. The document also shares sample results from using COSBench to measure the throughput and response times of OpenStack Swift in different configurations. It found that the proxy node's CPU was the bottleneck for larger workloads on one setup. The goal is to open source COSBench to help storage providers optimize performance.
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
##Что такое Storage Replica
##Архитектура и сценарии
##Синхронная и асинхронная репликация
##Междисковая, межсерверная, внутрикластерная и межкластерная репликация
##Дизайн и проектирование Storage Replica
##Нововведения в Windows Server 2016 TP5
##Графический интерфейс управления, и другие возможности - демонстрация и планы развития
##Интеграция Storage Replica с Storage Spaces Direct
Open-E DSS V7 Synchronous Volume Replication over a LANopen-e
The document provides step-by-step instructions for setting up synchronous volume replication between two Open-E DSS servers over a local area network. It involves configuring hardware, networking, creating logical volumes on the source and destination nodes, setting up replication between the volumes, and creating a replication task to synchronize data from the source to destination volume. The status of replication can be monitored by checking the replication tasks in the DSS management interface.
This document provides a step-by-step guide for setting up active-passive iSCSI failover between two Open-E DSS V7 nodes (node-a and node-b). The steps include: 1) configuring the hardware and network settings for each node; 2) creating volume groups and iSCSI volumes for data replication on each node; 3) configuring volume replication between the nodes; 4) creating iSCSI targets on each node; 5) configuring failover settings; and 6) testing the failover functionality. Key aspects involve replicating iSCSI volumes from the active node-a to the passive node-b, and configuring virtual IP addresses and targets on each node for seamless failover
The document provides step-by-step instructions for setting up an active-active load balanced iSCSI high availability cluster without bonding between two Open-E DSS V7 nodes (node-a and node-b). The key steps include:
1. Configuring the hardware for each node including network interfaces and IP addresses.
2. Configuring volumes, volume replication between each node's volumes to enable data synchronization, and starting the replication tasks.
3. Creating iSCSI targets on each node to expose the replicated volumes and enable failover.
This document provides steps to configure multipath I/O (MPIO) on an Open-E DSS V6 system with VMware ESXi 4.x and a Windows 2008 virtual machine. It requires two network cards in both systems connected to a switch. The steps include configuring the DSS V6 as an iSCSI target with two IP addresses, creating two vmkernel ports on the ESXi host connected to different network cards, adding the DSS as two iSCSI targets, enabling round robin path selection, and installing the Windows VM to test I/O performance using Iometer.
Step-by-Step Guide to NAS (NFS) Failover over a LAN (with unicast) Supported ...open-e
The document provides step-by-step instructions for configuring NAS (NFS) failover over a LAN using Open-E DSS. It describes setting up two servers with mirrored volumes, so that if the primary server fails, operations can fail over to the secondary server. The steps include 1) configuring the network interfaces and bonding on each server, 2) creating mirrored volumes and configuring replication on the primary and secondary servers, and 3) enabling NFS and sharing the volume to allow access from clients. This configuration provides data redundancy and high availability over a local network.
Open-E DSS V6 How to Setup iSCSI Failover with XenServeropen-e
The document provides instructions for setting up DSS V6 iSCSI failover with XenServer using multipath, which includes configuring hardware settings and IP addresses on both nodes, creating volumes and targets on the primary and secondary nodes, setting up volume replication between the nodes, and configuring multipath on the XenServer storage client. Key steps are configuring the secondary node as the replication destination, then the primary node as the replication source, and setting up iSCSI failover and a virtual IP for the replicated volume.
Open-E DSS Synchronous Volume Replication over a WANopen-e
This document provides a step-by-step guide to setting up synchronous volume replication over a WAN between two systems using Open-E DSS. It requires configuring hardware including two servers connected over a WAN. It then outlines 6 steps to set up the replication including 1) hardware configuration, 2) configuring DSS servers on the WAN, 3) configuring the destination node, 4) configuring the source node, 5) creating the replication task, and 6) checking replication status. Diagrams and explanations of each step in the configuration process are provided.
The document provides instructions for backing up data from a DSS V6 server to an attached tape library. The 4-step process includes: 1) configuring hardware and logical volumes, 2) creating NAS shares and snapshots, 3) configuring backup tasks and schedules to alternate between tape pools on odd and even weeks, and 4) setting up a restore task to recover data from backup tapes. When completed, the backup and restore processes are automated to run on a weekly schedule and maintain multiple versions of backed up data on tapes.
The document provides instructions for setting up a backup from a DSS V6 data server to an attached tape drive. The key steps include: 1) Configuring hardware and volume groups, 2) Creating NAS volumes and snapshots, 3) Configuring the backup to use the tape drive by defining pools, tasks, and schedules, and 4) Performing backups that store data from network shares on labeled tapes according to the defined configuration.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
1. Description of Open-E
Snapshots
A Step-by-Step Guide to How To Operate
Snapshot with Open-E® DSS™
Software Version: DSS ver. 6.00 up40
Presentation updated: September 2010
2. Description of Open-E Snapshot
SNAPSHOT DEFINITION
Snapshots allow the administrator to create a new block device which presents an
exact copy of a logical volume, frozen at some point in time. This provides access to
the data existing on the volume at the snapshot start time.
The original copy of the data continues to be available to the users without
interruption, while the snapshot copy is used to perform other functions on the data
for Backup and Data Replication applications or user access point in time data. This
needs to be performed on the logical volume, when some batch processing, i.e.
Backup or Data Replication, but you don't want to halt a live system that is changing
the data.
www.open-e.com 2
3. Description of Open-E Snapshot
BASIC EXPLANATION OF SNAPSHOT FUNCTIONING
The Open-E snapshot implementation supports several concurrent snapshots.
Snapshots require more space on the volume then one without snapshots. Because
point-in-time data cannot be overwritten during an active snapshot, this will need the
extra space. Deleted data is claimed as free space in a “live” volume mount, but in
reality the deleted data is still available in the snapshot mount. The size of reserved
space for snapshot depends on the amount of changed data while the snapshot is
active. Daily scheduled snapshots will need less reserved space then weekly
scheduled snapshot.
To calculate the size of the reserved space for snapshot, estimate the amount of
expected data changes when the snapshot will be active. A good rule to use would
be 2 or 3 times the size. For example, if a volume size is 1000GB and we are
reserving space for daily snapshot and we expect about 10GB changes every day,
the reserved space for snapshot will be 20 to 30 GB. If we decide for 30GB, it will be
3% of the whole 1000GB volume.
www.open-e.com 3
4. Description of Open-E Snapshot
OPEN-E SNAPSHOT TECHNOLOGY
• Snapshot is based on the Logical Volume Manager (LVM).
• Snapshot implements a “copy-on-write” on the entire block devices by copying
changed blocks just before they are to be overwritten to the other storage, thus
preserving a self-consistent past image of the block device. The file systems on
this image can later be mounted as if it were on read-only media.
• The total number of snapshots is dependent on the LVM, but 255 is a safe
number. There are, however, definite limitations as to the number of ACTIVE
snapshots (32bit and 64bit system) running at the same time:
10 per LV,
20 per system.
www.open-e.com 4
5. Description of Open-E Snapshot
FILE LEVEL DESCRIPTION
File system on the volume before snapshot is started. State at 7:59 A.M.
Live Data (100Gb), 7:59 A.M.
Volume Group 120Gb
file1
RD /WR file2
file3
For a simple description of the snapshot
Logical
function we consider files only and do not Volume
talk about volume blocks. Also we do (100Gb)
describe read-only snapshots.
Space reserved for
changes on Logical
Volume for snapshot functioning
(10Gb)
free space
free space Free space
free space (10Gb)
Storage
www.open-e.com 5
6. Description of Open-E Snapshot
FILE LEVEL DESCRIPTION
8:00 A.M. - Snapshot starts. Snapshot (100Gb ), frozen in point-in-
Live Data (100Gb), 8:00 A.M. time at snapshot start time 8:00 A.M.
Volume Group 120Gb
file1
file2
file3
Logical
After the snapshot starts, the files system Volume
will read the data as usual, but writes will (100Gb)
first copy the original data into the snapshot
reserved space (copy-on-write) and then
write new data.
Space reserved for
As a result of every write operation during changes on Logical
the active snapshot the process will be Volume for snapshot functioning
using reserved space of logical volume for (10Gb)
free space
written data.
free space Free space
free space (10Gb)
Storage
www.open-e.com 6
7. Description of Open-E Snapshot
FILE LEVEL DESCRIPTION
8:12 A.M. - Snapshot active Snapshot (100Gb ), frozen in point-in-
Live Data (100Gb), 8:12 A.M. time at snapshot start time 8:00 A.M.
Volume Group 120Gb
RD
file1
Read-Only, file2
Write-Protected file3
Logical
Volume
(100Gb)
After the snapshot starts, the file system RD /WR
writes a modification of file3 to the
reserved space, while snapshot mount
Space reserved for
show old file3 instance (state at 8:00 file3 changes on Logical
A.M.) Volume for snapshot functioning
(10Gb)
free space
free space Free space
free space (10Gb)
Storage
www.open-e.com 7
8. Description of Open-E Snapshot
FILE LEVEL DESCRIPTION
8:19 A.M. - Snapshot active Snapshot (100Gb ), frozen in point-in-
Live Data (100Gb), 8:19 A.M. time at snapshot start time 8:00 A.M.
Volume Group 120Gb
RD
file1
file2
file3
WR Logical
Volume
(100Gb)
While snapshot is active, the file system
forces the usage of reserved space for
the new file4. Space reserved for
file3 changes on Logical
file4 Volume for snapshot functioning
(10Gb)
free space
free space Free space
free space (10Gb)
Storage
www.open-e.com 8
9. Description of Open-E Snapshot
BLOCK LEVEL DESCRIPTION
For an easier explanation of how snapshot functions, we consider files only and do
not talk about volume blocks.
For example, a file uses 2 blocks and is being modified, if the first part which resides
on first block, then the write operation is related to the first block only.
The rest of the file (unchanged block) is being copied into the reserved space while
snapshot is active. The (COW) Logical volume will lose the bind with the old block and
get a new bind with just written blocks in the reserved space. The copy-on-write
operation is shown on next page.
www.open-e.com 9
10. Description of Open-E Snapshot
BLOCK LEVEL DESCRIPTION
8:12 A.M. - Snapshot active - modification of the file3. Snapshot (100Gb ), frozen in point-in-
Live Data (100Gb), 8:12 A.M. time at snapshot start time 8:00 A.M.
Volume Group 120Gb
RD
file1
file2
block 0
block 1 Logical Volume
loses connections
For example, the file system is going to file3 block 2
Copy with old data from
modify the file3 which resides on 2 blocks …..
block2 and instead
on Write ….. Logical
(block1 and 2). The snapshot is active. Volume binds new block 2',
For example, the file system is going to WR
(100Gb) however the
modify the file3 which resides on 2 blocks unchanged copy of
(block1 and 2). The modification must data from block 2
Space reserved for (before
happen on the small part of the second changes on Logical
block 2’
modifications) is still
block only. Now the block 2’ is copied into file3
Volume for snapshot available for
reserved space and then the file system functioning snapshot mount.
posts the modification (green color). Now, (10Gb)
file3 is located partially on Logical Volume free space
Free space
area on block 1 and on block 2 which free space (10Gb)
resides on the reserved space for free space
changes.
Storage
www.open-e.com 10
11. Description of Open-E Snapshot
BLOCK LEVEL DESCRIPTION
File system on the Volume after deactivation of Snapshot – state after 8:12 A.M.
Live Data (100Gb), 8:12 A.M.
Volume Group 120Gb
file1
file2 block 0
RD/WR file3 block 1
block 2
After the snapshot is stopped, the file ….. Logical
system binds to block 2 ' and the block 2 ….. Volume
is unbound and is declared as free space. (100Gb)
RD/WR
The Logical volume is fragmented now,
but the Logical Volume size stays
unchanged and is 100GB. The reserved
space is fragmented as well. file3 Space reserved for
The file3 does not reside on it’s neighbors changes on Logical
blocks, but is fragmented now. Volume for snapshot functioning
(10Gb)
free space
Free space
free space (10Gb)
free space
Storage
www.open-e.com 11
12. Description of Open-E Snapshot
BLOCK LEVEL DESCRIPTION
User decides to remove (delete) the reserved space for snapshot functioning. State after 8:12 A.M.
Live Data (100Gb), 8:12 A.M.
Volume Group 120Gb
file1
file2
file3
free space
After the deletion of the reserved space Logical
for snapshot function, this space is Volume
declared back as free space. The free (100Gb)
space is 20Gb now.
file3
free space
free space Free space
free space (20Gb)
free space
free space
Storage
www.open-e.com 12
13. Description of Open-E Snapshot
ADVANTAGES
• If there are no modifications made to the original data, no data copy will be
created.
• Snapshot is used for Backups and Data Replication, snapshot provides an
exclusive virtual access to the volume. This exclusive access for Backup or Data
Replication can work any time, also during production hours.
• Starting or stopping a snapshot is very fast, this only takes a few seconds even for
large amount of data.
www.open-e.com 13
14. Description of Open-E Snapshot
DISADVANTAGES
• Overflow of space reserved for snapshot causes the snapshot volume to become inaccessible
and lost access to the point-in-time data.
• Writing speed decreases with growing number of active snapshots (because of copy-on-write).
• Copying data from snapshot mount back to the live mount, may cause overflow of space
reserved for snapshot and as result unfinished copy.
• In the case of iSCSI or FC snapshots, reading only the data can cause a modification of last
access time of a file. This will cause the whole block chunk to be moved to a space reserved for
changes. The disadvantage will appear on file systems with an enabled last access time
attribute, as a read operation will cause a change of last access time attribute. It does not take
place for NAS volumes as Open-E systems will not store last access time.
• Obviously, an iSCSI or FC target can be formatted by any file system. Most of file systems
support the following file attributes: creation, modify and last access time. If last access time is
used, any read access will cause change of this attribute and as result will write to the volume.
Snapshot works on the volume level and uses a block size of 32 MB. Single file read will result
one block change, so a consumption of 32MB is reserved space for snapshot functionality.
www.open-e.com 14
15. Description of Open-E Snapshot
SUMMARY
The Snapshot functions within Open-E software is used for:
• Backups,
• Data replications,
• Access to accidentally deleted or modified files.
www.open-e.com 15