This document discusses Google's systems for handling large datasets, including their hardware infrastructure, distributed systems like GFS and BigTable, and future directions. It notes that Google uses many low-cost machines running Linux and in-house software to provide redundancy and scalability. Distributed file system GFS and database BigTable are used to store and access petabytes of data across thousands of machines.
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...peknap
In dealing with a real world problem of scheduling three classes of tasks – network packet forwarding, voice over IP and application level services for a home gateway device, the author found that mechanisms coming with vanilla Linux kernel are not enough. This talk will cover the unique real-time requirements for each task class, why moving softirq to process context with RT_PREEMPT patch is an important step in solving the problem and how a deadline based process scheduler would be a better solution than regular real-time scheduling classes.
Linux PREEMPT_RT improves the preemptiveness of the Linux kernel by allowing preemption everywhere except when preemption is disabled or interrupts are disabled. This reduces latency from preemption, critical sections, and interrupts. However, non-deterministic external interrupt events and timing as well as interrupt collisions can still cause unpredictable latency. Tracing tools can help analyze latency but practical issues remain in fully guaranteeing hard real-time behavior.
Windows Server Virtualization - Hyper-V 2008 R2aralves
The document discusses new features in Windows Server 2008 R2 including Hyper-V 2, Cluster Shared Volumes, Live Migration, and hosted desktops. It provides details on the architecture and capabilities of Hyper-V, including support for 32-bit and 64-bit VMs, large memory support, SMP, and integrated cluster support. It describes features like Live Migration, storage improvements with Cluster Shared Volumes, and new processor support for technologies like Second Level Address Translation.
Kernel Multiplexer or KMux is a system call interposition framework that intercepts the communciation between user and kernel space in order to extend, enhance or replace kernel extensions. It has very low overhead and can be configured to achieve fine grained control over individual processes in a system.
The document discusses the Linux kernel memory model (LKMM). It provides an overview of LKMM, including that it defines ordering rules for the Linux kernel due to weaknesses in the C language standard and need to support multiple hardware architectures. It describes ordering primitives like atomic operations and memory barriers provided by LKMM and how the LKMM was formalized into an executable model that can prove properties of parallel code against the LKMM.
GCMA: Guaranteed Contiguous Memory AllocatorSeongJae Park
This document summarizes a presentation about GCMA, a Guaranteed Contiguous Memory Allocator for Linux kernels. GCMA aims to provide fast and guaranteed contiguous memory allocation while maintaining good memory utilization. It designates "discardable" pages used by frontswap and clean cache as secondary clients that can be quickly vacated from the reserved area. Evaluations show GCMA provides significantly faster allocation latency and system performance compared to CMA, especially under memory pressure or fragmentation. The presenter is working to upstream GCMA code and provide further evaluation results.
Dev Conf 2017 - Meeting nfv networking requirementsFlavio Leitner
NFV networking is about delivering packets to virtual machines or containers. Can we provide high throughput, low latency and zero packet loss? This presentation will show the pros and cons of some technologies and then go deeper into DPDK accelerated OVS to uncover how it works, current challenges, and possible solutions.
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...peknap
In dealing with a real world problem of scheduling three classes of tasks – network packet forwarding, voice over IP and application level services for a home gateway device, the author found that mechanisms coming with vanilla Linux kernel are not enough. This talk will cover the unique real-time requirements for each task class, why moving softirq to process context with RT_PREEMPT patch is an important step in solving the problem and how a deadline based process scheduler would be a better solution than regular real-time scheduling classes.
Linux PREEMPT_RT improves the preemptiveness of the Linux kernel by allowing preemption everywhere except when preemption is disabled or interrupts are disabled. This reduces latency from preemption, critical sections, and interrupts. However, non-deterministic external interrupt events and timing as well as interrupt collisions can still cause unpredictable latency. Tracing tools can help analyze latency but practical issues remain in fully guaranteeing hard real-time behavior.
Windows Server Virtualization - Hyper-V 2008 R2aralves
The document discusses new features in Windows Server 2008 R2 including Hyper-V 2, Cluster Shared Volumes, Live Migration, and hosted desktops. It provides details on the architecture and capabilities of Hyper-V, including support for 32-bit and 64-bit VMs, large memory support, SMP, and integrated cluster support. It describes features like Live Migration, storage improvements with Cluster Shared Volumes, and new processor support for technologies like Second Level Address Translation.
Kernel Multiplexer or KMux is a system call interposition framework that intercepts the communciation between user and kernel space in order to extend, enhance or replace kernel extensions. It has very low overhead and can be configured to achieve fine grained control over individual processes in a system.
The document discusses the Linux kernel memory model (LKMM). It provides an overview of LKMM, including that it defines ordering rules for the Linux kernel due to weaknesses in the C language standard and need to support multiple hardware architectures. It describes ordering primitives like atomic operations and memory barriers provided by LKMM and how the LKMM was formalized into an executable model that can prove properties of parallel code against the LKMM.
GCMA: Guaranteed Contiguous Memory AllocatorSeongJae Park
This document summarizes a presentation about GCMA, a Guaranteed Contiguous Memory Allocator for Linux kernels. GCMA aims to provide fast and guaranteed contiguous memory allocation while maintaining good memory utilization. It designates "discardable" pages used by frontswap and clean cache as secondary clients that can be quickly vacated from the reserved area. Evaluations show GCMA provides significantly faster allocation latency and system performance compared to CMA, especially under memory pressure or fragmentation. The presenter is working to upstream GCMA code and provide further evaluation results.
Dev Conf 2017 - Meeting nfv networking requirementsFlavio Leitner
NFV networking is about delivering packets to virtual machines or containers. Can we provide high throughput, low latency and zero packet loss? This presentation will show the pros and cons of some technologies and then go deeper into DPDK accelerated OVS to uncover how it works, current challenges, and possible solutions.
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
Linux kernel provides executable and formalized memory model. These slides describe the nature of parallel programming in the Linux kernel and what memory model is and why it is necessary and important for kernel programmers. The slides were used at KOSSCON 2018 (https://kosscon.kr/).
"Controlling a laser with Linux is crazy, but everyone in this room is crazy in his own way. So if you want to use Linux to control an industrial welding laser, I have no problem with your using PREEMPT_RT." -- Linus Torvalds
During the last few months of 2011 the Xen Community started an effort to port Xen to ARMv7 with virtualization extensions, using the Cortex A15 processor as reference platform.The new Xen port is exploiting this set of hardware capabilities to run guest VMs in the most efficient way possible while keeping the ARM specific changes to the hypervisor and the Linux kernel to a minimum.
Developing the new port we took the chance to remove legacy concepts like PV or HVM guests and only support a single kind of guests that is comparable to "PV on HVM" in the Xen X86 world.
This talk will explain the reason behind this and other design choices that we made during the early development process and it will go through the main technical challenges that we had to solve in order to accomplish our goal.
Notable examples are the way Linux guests issue hypercalls and receive event channels notifications from Xen.
A Xen MIPS port implementation will be presented. In particular, major techniques used for cpu and memory para-virtualization on top of xen and linux will be presented. The major changes in xen hypervisor, xen tools and linux will be illustrated. The challenges, main issues we faced and solutions we applied will be discussed. Overall porting status and next steps will also be discussed.
Network functions virtualization (NFV) has the potential to transform the way operators offer services. While it brings with it flexibility to enable operators to offer customizable services that can deliver great value to the end user - or as a leading carrier describes it, a "user-defined network" - it can also complicate network operations.
Some of the concerns over sync and NFV are already being addressed in the data center world. Take, for example, in
large financial trading houses where synchronization is
tightly coupled into the software architecture to provide microsecond-level time-stamping to trades. This presentation
examines the new options for synchronization as it relates to NFV - and what it will take to enable accurate synchronization over a virtual network.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
Scaling web applications with cassandra presentationMurat Çakal
This document provides an introduction and overview of Cassandra, including:
- Cassandra is a distributed database modeled after Amazon Dynamo and Google Bigtable that is highly scalable and fault tolerant.
- It is used by many large companies for applications that require fast writes, high availability, and elastic scalability.
- Cassandra's data model uses a column-oriented design organized into keyspaces, column families, rows, and columns. It also supports super columns.
- The document discusses Cassandra's features like tunable consistency levels, replication, and its data distribution using consistent hashing.
- An overview of Cassandra's Thrift API and basic operations like get, batch mutate, and
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and GraphitePratima Singh
This document describes configuration files for the Nagios Remote Plugin Executor (NRPE) monitoring system. It lists common configuration files like nrpe.cfg that define the core NRPE settings and files in the /etc/nrpe.d directory that configure checks, hosts, and services to monitor. Template files generate the actual host and service configuration files.
Kernel synchronization methods are required to prevent race conditions when shared resources are accessed concurrently by multiple threads. The Linux kernel supports various synchronization primitives including atomic operations, spin locks, semaphores, mutexes, completion variables, and the big kernel lock. Each method has advantages and limitations depending on whether the resource can be accessed in interrupt context or while sleeping. More complex schemes like reader-writer locks and sequential locks also provide alternatives to binary locks.
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014Davidlohr Bueso
This document summarizes an overview of kernel lock improvements presented by Davidlohr Bueso and Scott Norton at LinuxConNorth America in August 2014. It discusses the issues of cache line contention in large NUMA systems and how lock contention can cause performance degradation. It presents results of microbenchmarks demonstrating how performance is significantly impacted as more cores and sockets are involved in cache line contention. The document emphasizes that minimizing cache line contention within kernel locking primitives is important for applications to scale well on systems with many sockets and cores.
Cassandra is a distributed NoSQL database that was created by Facebook in 2007 and open-sourced in 2008. It became an Apache project in 2009 and graduated as a top-level project in 2010. Cassandra uses a distributed hash table architecture and supports eventual consistency. It provides high availability with no single points of failure and linear scalability. Data is replicated across multiple nodes for fault tolerance.
Agenda:
In this talk we will present various locking mechanisms implemented in the linux kernel.
From System V locks to raw spinlocks and the RT patch.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
https://github.com/veltzer
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
These are the slides for a presentation I recently gave at a seminar on Tools for High-Performance Computing with Big Graphs. It covers Google's Pregel system, in use for processing graph algorithms in a scalable manner.
This document summarizes optimizations for MySQL performance on Linux hardware. It covers SSD and memory performance impacts, file I/O, networking, and useful tools. The history of MySQL performance improvements is discussed from hardware upgrades like SSDs and more CPU cores to software optimizations like improved algorithms and concurrency. Optimizing per-server performance to reduce total servers needed is emphasized.
Writing and testing high frequency trading engines in javaPeter Lawrey
JavaOne presentation of Writing and Testing High Frequency Trading Engines in Java. Talk looks at low latency trading, thread affinity, lock free code, ultra low garbage and low latency persistence and IPC.
The heavyweight "process model", historically used by Unix systems, including Linux, to split a large system into smaller, more tractable pieces doesn’t always lend itself to embedded environments
owing to substantial computational overhead. POSIX threads, also known as Pthreads, is a multithreading API that looks more like what embedded programmers are used to but runs in a
Unix/Linux environment. This presentation introduces Posix Threads and shows you how to use threads to create more efficient, more responsive programs.
Understanding Data Consistency in Apache CassandraDataStax
This document provides an overview of data consistency in Apache Cassandra. It discusses how Cassandra writes data to commit logs and memtables before flushing to SSTables. It also reviews the CAP theorem and how Cassandra offers tunable consistency levels for both reads and writes. Strategies for choosing consistency levels for writes, such as ANY, ONE, QUORUM, and ALL are presented. The document also covers read repair and hinted handoffs in Cassandra. Examples of CQL queries with different consistency levels are given and information on where to download Cassandra is provided at the end.
The document discusses improvements to the implementation of futexes (fast userspace mutexes) in the Linux kernel to improve scaling on multicore systems. Some key issues with the original futex implementation are a global hash table that does not scale well with NUMA, hash collisions, and contention on hash bucket locks. Improvements discussed include using per-process or per-thread hash tables to address NUMA issues, improving hashing to reduce collisions, releasing hash bucket locks before waking tasks to allow concurrent wakeups, and replacing spinlocks with queued/MCS locks to reduce cacheline bouncing under contention. These changes aim to improve futex performance and scalability as the number of cores in systems increases.
This document discusses memory barriers in the Linux kernel. It covers the need for memory barriers to prevent instruction reordering issues, the different types of barriers used in the kernel including implicit barriers in functions, atomic operations, and acquire/release semantics. It also provides examples of how barriers are used for common tasks like sleeping and waking between CPUs.
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
The document discusses spin locks and semaphores in the Linux kernel. It begins with an introduction to the difference between spin locks and semaphores. Spin locks cause threads to continuously loop trying to acquire the lock, while semaphores cause threads to sleep. An example is given of a deadlock scenario that can occur with spin locks. The document then discusses the concept of context in the kernel, including user context, interrupt context, and the control flow during procedure calls and interrupts. Log analysis and examples of double-acquire deadlocks involving spin locks are provided. The document concludes with recommendations for how to prevent deadlocks, such as using spin_lock_irqsave/restore and avoiding semaphores in interrupt context.
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
Linux kernel provides executable and formalized memory model. These slides describe the nature of parallel programming in the Linux kernel and what memory model is and why it is necessary and important for kernel programmers. The slides were used at KOSSCON 2018 (https://kosscon.kr/).
"Controlling a laser with Linux is crazy, but everyone in this room is crazy in his own way. So if you want to use Linux to control an industrial welding laser, I have no problem with your using PREEMPT_RT." -- Linus Torvalds
During the last few months of 2011 the Xen Community started an effort to port Xen to ARMv7 with virtualization extensions, using the Cortex A15 processor as reference platform.The new Xen port is exploiting this set of hardware capabilities to run guest VMs in the most efficient way possible while keeping the ARM specific changes to the hypervisor and the Linux kernel to a minimum.
Developing the new port we took the chance to remove legacy concepts like PV or HVM guests and only support a single kind of guests that is comparable to "PV on HVM" in the Xen X86 world.
This talk will explain the reason behind this and other design choices that we made during the early development process and it will go through the main technical challenges that we had to solve in order to accomplish our goal.
Notable examples are the way Linux guests issue hypercalls and receive event channels notifications from Xen.
A Xen MIPS port implementation will be presented. In particular, major techniques used for cpu and memory para-virtualization on top of xen and linux will be presented. The major changes in xen hypervisor, xen tools and linux will be illustrated. The challenges, main issues we faced and solutions we applied will be discussed. Overall porting status and next steps will also be discussed.
Network functions virtualization (NFV) has the potential to transform the way operators offer services. While it brings with it flexibility to enable operators to offer customizable services that can deliver great value to the end user - or as a leading carrier describes it, a "user-defined network" - it can also complicate network operations.
Some of the concerns over sync and NFV are already being addressed in the data center world. Take, for example, in
large financial trading houses where synchronization is
tightly coupled into the software architecture to provide microsecond-level time-stamping to trades. This presentation
examines the new options for synchronization as it relates to NFV - and what it will take to enable accurate synchronization over a virtual network.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
Scaling web applications with cassandra presentationMurat Çakal
This document provides an introduction and overview of Cassandra, including:
- Cassandra is a distributed database modeled after Amazon Dynamo and Google Bigtable that is highly scalable and fault tolerant.
- It is used by many large companies for applications that require fast writes, high availability, and elastic scalability.
- Cassandra's data model uses a column-oriented design organized into keyspaces, column families, rows, and columns. It also supports super columns.
- The document discusses Cassandra's features like tunable consistency levels, replication, and its data distribution using consistent hashing.
- An overview of Cassandra's Thrift API and basic operations like get, batch mutate, and
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and GraphitePratima Singh
This document describes configuration files for the Nagios Remote Plugin Executor (NRPE) monitoring system. It lists common configuration files like nrpe.cfg that define the core NRPE settings and files in the /etc/nrpe.d directory that configure checks, hosts, and services to monitor. Template files generate the actual host and service configuration files.
Kernel synchronization methods are required to prevent race conditions when shared resources are accessed concurrently by multiple threads. The Linux kernel supports various synchronization primitives including atomic operations, spin locks, semaphores, mutexes, completion variables, and the big kernel lock. Each method has advantages and limitations depending on whether the resource can be accessed in interrupt context or while sleeping. More complex schemes like reader-writer locks and sequential locks also provide alternatives to binary locks.
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014Davidlohr Bueso
This document summarizes an overview of kernel lock improvements presented by Davidlohr Bueso and Scott Norton at LinuxConNorth America in August 2014. It discusses the issues of cache line contention in large NUMA systems and how lock contention can cause performance degradation. It presents results of microbenchmarks demonstrating how performance is significantly impacted as more cores and sockets are involved in cache line contention. The document emphasizes that minimizing cache line contention within kernel locking primitives is important for applications to scale well on systems with many sockets and cores.
Cassandra is a distributed NoSQL database that was created by Facebook in 2007 and open-sourced in 2008. It became an Apache project in 2009 and graduated as a top-level project in 2010. Cassandra uses a distributed hash table architecture and supports eventual consistency. It provides high availability with no single points of failure and linear scalability. Data is replicated across multiple nodes for fault tolerance.
Agenda:
In this talk we will present various locking mechanisms implemented in the linux kernel.
From System V locks to raw spinlocks and the RT patch.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
https://github.com/veltzer
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
These are the slides for a presentation I recently gave at a seminar on Tools for High-Performance Computing with Big Graphs. It covers Google's Pregel system, in use for processing graph algorithms in a scalable manner.
This document summarizes optimizations for MySQL performance on Linux hardware. It covers SSD and memory performance impacts, file I/O, networking, and useful tools. The history of MySQL performance improvements is discussed from hardware upgrades like SSDs and more CPU cores to software optimizations like improved algorithms and concurrency. Optimizing per-server performance to reduce total servers needed is emphasized.
Writing and testing high frequency trading engines in javaPeter Lawrey
JavaOne presentation of Writing and Testing High Frequency Trading Engines in Java. Talk looks at low latency trading, thread affinity, lock free code, ultra low garbage and low latency persistence and IPC.
The heavyweight "process model", historically used by Unix systems, including Linux, to split a large system into smaller, more tractable pieces doesn’t always lend itself to embedded environments
owing to substantial computational overhead. POSIX threads, also known as Pthreads, is a multithreading API that looks more like what embedded programmers are used to but runs in a
Unix/Linux environment. This presentation introduces Posix Threads and shows you how to use threads to create more efficient, more responsive programs.
Understanding Data Consistency in Apache CassandraDataStax
This document provides an overview of data consistency in Apache Cassandra. It discusses how Cassandra writes data to commit logs and memtables before flushing to SSTables. It also reviews the CAP theorem and how Cassandra offers tunable consistency levels for both reads and writes. Strategies for choosing consistency levels for writes, such as ANY, ONE, QUORUM, and ALL are presented. The document also covers read repair and hinted handoffs in Cassandra. Examples of CQL queries with different consistency levels are given and information on where to download Cassandra is provided at the end.
The document discusses improvements to the implementation of futexes (fast userspace mutexes) in the Linux kernel to improve scaling on multicore systems. Some key issues with the original futex implementation are a global hash table that does not scale well with NUMA, hash collisions, and contention on hash bucket locks. Improvements discussed include using per-process or per-thread hash tables to address NUMA issues, improving hashing to reduce collisions, releasing hash bucket locks before waking tasks to allow concurrent wakeups, and replacing spinlocks with queued/MCS locks to reduce cacheline bouncing under contention. These changes aim to improve futex performance and scalability as the number of cores in systems increases.
This document discusses memory barriers in the Linux kernel. It covers the need for memory barriers to prevent instruction reordering issues, the different types of barriers used in the kernel including implicit barriers in functions, atomic operations, and acquire/release semantics. It also provides examples of how barriers are used for common tasks like sleeping and waking between CPUs.
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
The document discusses spin locks and semaphores in the Linux kernel. It begins with an introduction to the difference between spin locks and semaphores. Spin locks cause threads to continuously loop trying to acquire the lock, while semaphores cause threads to sleep. An example is given of a deadlock scenario that can occur with spin locks. The document then discusses the concept of context in the kernel, including user context, interrupt context, and the control flow during procedure calls and interrupts. Log analysis and examples of double-acquire deadlocks involving spin locks are provided. The document concludes with recommendations for how to prevent deadlocks, such as using spin_lock_irqsave/restore and avoiding semaphores in interrupt context.
This document describes the setup and architecture of a Red Hat Storage Cluster using Global File System (GFS), Clustered Logical Volume Manager (CLVM), and Global Network Block Device (GNBD). GFS allows nodes to share block-level storage over the network as if it were locally attached. GNBD exports block devices over TCP/IP to GFS nodes. CLVM provides cluster-wide logical volume management on top of shared block devices. The cluster uses components like CMAN, DLM, and fencing for distributed coordination and locking across nodes.
The document describes Google File System (GFS), which is a distributed file system developed by Google to store huge files across inexpensive commodity servers. GFS is fault-tolerant, scalable, and optimized for large streaming reads and appends. It works by dividing files into fixed-size chunks that are replicated and stored across multiple chunkservers, while a master server manages metadata and chunk placement.
This document discusses infrastructure for cloud computing and Google's tools. It describes Google's MapReduce and BigTable frameworks, which were developed for large-scale data processing and storage. It also outlines Google's Academic Cloud Computing Initiative (ACCI) partnership with universities to provide cloud computing education and skills. ACCI has helped create cloud computing courses at schools like Tsinghua University in China.
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim CrontabsPaolo Negri
Slide of the RailsConf 2009 session
Discover how is possible to use parallel execution to batch process large amount of data, learn how to use queues to distribute workload and coordinate processes, increase the throughput on system with high latency. Have fun with EventMachine, AMQP, RabbitMQ and get rid of that every 5mins cronjob
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo
This document discusses scaling Nuxeo applications by ensuring good performance. It emphasizes defining usage hypotheses and constraints, conducting performance testing, and ongoing tuning. Key points include spreading services across VMs, using a VCS cluster for redundancy and scaling out, and partitioning data across multiple repositories. Benchmark results show document retrieval and insertion operations can achieve throughput of hundreds per second depending on the scenario. Regular performance monitoring and tuning of software configurations and databases is important for scaling Nuxeo applications.
LXDE is a lightweight desktop environment for Linux that is less resource intensive than other desktop environments like GNOME and Xfce. It was started in 2005 and is developed by an international community. LXDE uses GTK+ and follows standards from freedesktop.org to ensure compatibility. It provides core components like a file manager, panel, task manager, and other applications while keeping low hardware requirements. The LXDE community is growing worldwide with contributors from Asia, Europe and other regions.
This is a presentation I presented at NVIDIA AI Conference in Korea. It's about building the largest GPU - DGX-2, the most powerful supercomputer in one node.
Network functions virtualization (NFV) has the potential to transform the way operators offer services. While it brings with it flexibility to enable operators to offer customizable services that can deliver great value to the end user - or as a leading carrier describes it, a "user-defined network" - it can also complicate network operations.
Some of the concerns over sync and NFV are already being addressed in the data center world. Take, for example, in
large financial trading houses where synchronization is
tightly coupled into the software architecture to provide microsecond-level time-stamping to trades. This presentation
examines the new options for synchronization as it relates to NFV - and what it will take to enable accurate synchronization over a virtual network.
Ryu is an open-source network operating system that provides a programmatic interface for network control and acts as a logically centralized controller for thousands of switches. It is fully written in Python and its goals include becoming the de facto open-source network OS and the standard network controller for OpenStack. Ryu brings flat L2 networking to OpenStack regardless of the underlying physical network and provides scalable multi-tenant isolation through tunneling.
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
Abstract
The NetBSD rump kernel has been developed for some years now, allowing NetBSD kernel drivers to be used unmodified in many environments, for example as userspace code. However it is only since last year that it has become possible to easily run unmodified applications on the rump kernel, initially with the rump kernel on Xen port, and then with the rumprun tools to run them in userspace on Linux, FreeBSD and NetBSD. This talk will look at how this is achieved, and look at use cases, including kernel driver development, and lightweight process virtualization.
Speaker bio
Justin Cormack has been a Unix user, developer and sysadmin since the early 1990s. He is based in London and works on open source cloud applications, Lua, and the NetBSD rump kernel project. He has been a NetBSD developer since early 2014.
This document discusses real-time Linux programming. It defines real-time as systems that must guarantee response times within strict deadlines, from milliseconds to microseconds. Real-time hardware uses a hardware clock to guarantee timing. Real-time software can be written in any language but C and C++ are preferred. Linux supports real-time capabilities through patches that improve scheduling and reduce latency. The document discusses avoiding page faults, limiting interrupts, and measuring latency in real-time systems.
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
This document summarizes the MapReduce programming model and its implementation for processing large datasets in parallel across clusters of computers. The key points are:
1) MapReduce expresses computations as two functions - Map and Reduce. Map processes input key-value pairs and generates intermediate output. Reduce combines these intermediate values to form the final output.
2) The implementation automatically parallelizes programs by partitioning work across nodes, scheduling tasks, and handling failures transparently. It optimizes data locality by scheduling tasks on machines containing input data.
3) The implementation provides fault tolerance by reexecuting failed tasks, guaranteeing the same output as non-faulty execution. Status information and counters help monitor progress and collect metrics.
This document presents a software-based technique for partitioning shared last-level caches (L2 caches) on multicore systems to improve performance. It implements page coloring to allocate physical pages for each process to distinct cache line colors. Experimental results on a Power5 system show this approach can control cache usage and improve performance for multiprogrammed workloads by up to 17% compared to an uncontrolled shared cache. The document also finds that cache stall rates provide a better performance analysis metric than miss rates for some workloads.
[Podman Special Event] Kubernetes in Rootless PodmanAkihiro Suda
- Kubernetes can run in rootless containers using techniques like Podman, Docker, and containerd which map the root user inside containers to a non-root user on the host for improved security.
- Popular ways to run rootless Kubernetes include kind, minikube wrapped in Podman containers, and Usernetes which supports real multi-node clusters across multiple hosts using networking like Flannel.
- Future work includes promoting the "KubeletInUserNamespace" feature flag and eliminating overhead of user-mode TCP/IP for containers to improve the rootless Kubernetes experience.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Alexey Rybakov, Senior Director at LUXOFT, presents the "Making Computer Vision Software Run Fast on Your Embedded Platform" tutorial at the May 2016 Embedded Vision Summit.
Many computer vision algorithms perform well on desktop class systems, but struggle on resource constrained embedded platforms. This how-to talk provides a comprehensive overview of various optimization methods that make vision software run fast on low power, small footprint hardware that is widely used in automotive, surveillance, and mobile devices. The presentation explores practical aspects of deep algorithm and software optimization such as thinning of input data, using dynamic regions of interest, mastering data pipelines and memory access, overcoming compiler inefficiencies, and more.
GTS, Global Trigger and Synchronization systemJoelChavas
GTS synchronizes electronic cards at the sub-nanossecond level. GTS also triggers the data acquisition of the experiment.
Structurally, it is a tree of cards connected by gigabit optical fibers. All the trigger decisions are made in real-time in a central trigger processor.
The system is currently in use in the nuclear physics experiment AGATA. Moreover, following this presentation, it has been chosen (with some hardware integration planned) to trigger and synchronize the experiments of SPIRAL2 in GANIL.
Automation@Brainly - Polish Linux Autumn 2014vespian_256
This document describes how Brainly, a social network for homework help, automated their infrastructure using Ansible. It discusses how they migrated from custom scripts and packaging to using Ansible for configuration management. Key areas automated include Apache, DNS, user management, backups, monitoring with Icinga, clustering with Corosync/Pacemaker, firewalls, and scaling their infrastructure across multiple markets. While Ansible worked well overall, some challenges included complex templates, limitations of Jinja2, and lack of Python integration in roles.
This document discusses making the Android operating system real-time by applying the PREEMPT_RT patch to the Linux kernel. It describes real-time systems, proposed architectures for real-time Android, and challenges in applying the RT patch to the DragonBoard's kernel due to codebase and driver differences. It also covers preemption models from no forced preemption to fully preemptible real-time kernels and mentions performance testing of these models.
Salt is an open source configuration management and remote execution system. It allows users to remotely execute commands and manage configurations on multiple systems. Key features include a master-minion architecture with remote execution capabilities, a flexible and extensible design, and support for configuration management through states. States allow users to declaratively define the configuration of systems and ensure consistency across environments.
This document summarizes Farhan Mashraqi's presentation about scaling the MySQL database that powers the photo blogging website Fotolog. It describes how Fotolog has grown to host over 228 million photos and 2.47 billion comments. The MySQL infrastructure consists of 32 servers split across four clusters to handle the large volume of reads and writes. Key aspects discussed include table partitioning, improving performance through index changes and switching to InnoDB, and strategies for ongoing scalability.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
6 Dean Google
1. Handling Large Datasets at Google:
Current Systems and Future
Directions
Jeff Dean
Google Fellow
http://labs.google.com/people/jeff
2. Outline
• Hardware infrastructure
• Distributed systems infrastructure:
– Scheduling system
– GFS
– BigTable
– MapReduce
• Challenges and Future Directions
– Infrastructure that spans all datacenters
– More automation
3. Sample Problem Domains
• Offline batch jobs
– Large datasets (PBs), bulk reads/writes (MB chunks)
– Short outages acceptable
– Web indexing, log processing, satellite imagery, etc.
• Online applications
– Smaller datasets (TBs), small reads/writes small (KBs)
– Outages immediately visible to users, low latency vital
– Web search, Orkut, GMail, Google Docs, etc.
• Many areas: IR, machine learning, image/video
processing, NLP, machine translation, ...
4. Typical New Engineer
• Never seen a
petabyte of data
• Never used a
thousand machines
• Never really
experienced machine
failure
Our software has to make them successful.
5. Google’s Hardware Philosophy
Truckloads of low-cost machines
• Workloads are large and easily parallelized
• Care about perf/$, not absolute machine perf
• Even reliable hardware fails at our scale
• Many datacenters, all around the world
– Intra-DC bandwidth >> Inter-DC bandwidth
– Speed of light has remained fixed in last 10 yrs :)
6. Effects of Hardware Philosophy
• Software must
tolerate failure
• Application’s
particular machine
should not matter
• No special machines
- just 2 or 3 flavors
Google - 1999
7. Current Design
• In-house rack design
• PC-class
motherboards
• Low-end storage and
networking hardware
• Linux
• + in-house software
8. The Joys of Real Hardware
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packet loss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
9. Typical Cluster
Machine 1 Machine 2 Machine N
…
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
10. Typical Cluster
Cluster scheduling master Chubby Lock service GFS master
Machine 1 Machine 2 Machine N
…
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
11. Typical Cluster
Cluster scheduling master Chubby Lock service GFS master
Machine 1 Machine 2 Machine N
User
app1
User
…
app1
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
12. Typical Cluster
Cluster scheduling master Chubby Lock service GFS master
Machine 1 Machine 2 Machine N
User
app1
User
…
User app2 app1
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
13. Typical Cluster
Cluster scheduling master Chubby Lock service GFS master
Machine 1 Machine 2 Machine N
User
app1 BigTable BigTable
server server
User
…
User app2 app1
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
14. Typical Cluster
Cluster scheduling master Chubby Lock service GFS master
Machine 1 Machine 2 Machine N
User
app1 BigTable BigTable
BigTable master
server server
User
…
User app2 app1
Scheduler Scheduler Scheduler
GFS GFS GFS
slave slave slave
chunkserver chunkserver chunkserver
Linux Linux Linux
15. File Storage: GFS
Client
GFS
Master
Client
Client
C1 C1 C0 C5
C0
…
C2 C3
C5 C2
C5
Chunkserver 2 Chunkserver N
Chunkserver 1
• Master: Manages file metadata
• Chunkserver: Manages 64MB file chunks
• Clients talk to master to open and find files
• Clients talk directly to chunkservers for data
16. GFS Usage
• 200+ GFS clusters
• Managed by an internal service team
• Largest clusters
– 5000+ machines
– 5+ PB of disk usage
– 10000+ clients
17. Data Storage: BigTable
What is it, really?
• 10-ft view: Row &
column abstraction for
storing data
• Reality: Distributed,
persistent, multi-level
sorted map
18. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
19. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
Rows
“www.cnn.com”
20. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
“contents:” Columns
Rows
“www.cnn.com”
21. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
“contents:” Columns
Rows
“www.cnn.com”
“<html>…”
22. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
“contents:” Columns
Rows
“www.cnn.com”
t17
“<html>…”
Timestamps
23. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
“contents:” Columns
Rows
t11
“www.cnn.com”
t17
“<html>…”
Timestamps
24. BigTable Data Model
• Multi-dimensional sparse sorted map
(row, column, timestamp) => value
“contents:” Columns
Rows
t3
t11
“www.cnn.com”
t17
“<html>…”
Timestamps
36. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
37. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
38. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
39. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
handles failover, monitoring
40. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
handles failover, monitoring holds tablet data, logs
41. Bigtable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
42. Bigtable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
43. Bigtable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops + Open()
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
44. Bigtable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops + Open()
read/write
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
45. Bigtable System Structure
Bigtable client
Bigtable Cell Bigtable client
metadata ops
library
Bigtable master
performs metadata ops + Open()
read/write
load balancing
Bigtable tablet server …
Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
46. Some BigTable Features
• Single-row transactions: easy to do read/modify/write
operations
• Locality groups: segregate columns into different files
• In-memory columns: random access to small items
• Suite of compression techniques: per-locality group
• Bloom filters: avoid seeks for non-existent data
• Replication: eventual-consistency replication across
datacenters, between multiple BigTable serving setups
(master/slave & multi-master)
47. BigTable Usage
• 500+ BigTable cells
• Largest cells manage 6000TB+ of data,
3000+ machines
• Busiest cells sustain >500000+ ops/
second 24 hours/day, and peak much
higher
48. Data Processing: MapReduce
• Google’s batch processing tool of choice
• Users write two functions:
– Map: Produces (key, value) pairs from input
– Reduce: Merges (key, value) pairs from Map
• Library handles data transfer and failures
• Used everywhere: Earth, News, Analytics,
Search Quality, Indexing, …
49. Example: Document Indexing
• Input: Set of documents D1, …, DN
• Map
– Parse document D into terms T1, …, TN
– Produces (key, value) pairs
• (T1, D), …, (TN, D)
• Reduce
– Receives list of (key, value) pairs for term T
• (T, D1), …, (T, DN)
– Emits single (key, value) pair
• (T, (D1, …, DN))
55. MapReduce in Google
Easy to use. Library hides complexity.
Mar, ‘05 Mar, ‘06 Sep, '07
Number of jobs 72K 171K 2,217K
Average time (seconds) 934 874 395
Machine years used 981 2,002 11,081
Input data read (TB) 12,571 52,254 403,152
Intermediate data (TB) 2,756 6,743 34,774
Output data written (TB) 941 2,970 14,018
Average worker machines 232 268 394
56. Current Work
Scheduling system + GFS + BigTable + MapReduce work
well within single clusters
Many separate instances in different data centers
– Tools on top deal with cross-cluster issues
– Each tool solves relatively narrow problem
– Many tools => lots of complexity
Can next generation infrastructure do more?
57. Next Generation Infrastructure
Truly global systems to span all our datacenters
• Global namespace with many replicas of data worldwide
• Support both consistent and inconsistent operations
• Continued operation even with datacenter partitions
• Users specify high-level desires:
“99%ile latency for accessing this data should be <50ms”
“Store this data on at least 2 disks in EU, 2 in U.S. & 1 in Asia”
– Increased utilization through automation
– Automatic migration, growing and shrinking of services
– Lower end-user latency
– Provide high-level programming model for data-intensive
interactive services
58. Questions?
Further info:
• The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak
Leung, SOSP ‘03.
• Web Search for a Planet: The Google Cluster Architecture, Luiz Andre
Barroso, Jeffrey Dean, Urs Hölzle, IEEE Micro, 2003.
• Bigtable: A Distributed Storage System for Structured Data, Fay Chang,
Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike
Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, OSDI’06
• MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and
Sanjay Ghemawat, OSDIʼ04
• Failure Trends in a Large Disk Drive Population, Eduardo Pinheiro, Wolf-
Dietrich Weber and Luiz André Barroso. FAST, ‘07.
http://labs.google.com/papers
http://labs.google.com/people/jeff or jeff@google.com