- MapReduce is a programming model for processing large datasets in a distributed manner across clusters of machines. It handles parallelization, load balancing, and hardware failures automatically.
- In MapReduce, the input data is mapped to intermediate key-value pairs, shuffled and sorted by the keys, then reduced to produce the final output. This pattern applies to many large-scale computing problems.
- Google uses MapReduce for tasks like generating map tiles, processing log files, and mining user data at massive scales across thousands of machines. The programming model hides complex distributed systems details from developers.
Tungsten University: Set Up And Manage Advanced Replication TopologiesContinuent
Do you know how to set up Tungsten Replication to handle multi-master topologies? Do you know how to replicate transactions from multiple servers into a single slave? Do you know how to replicate between Tungsten clusters? In this course we show you how to set up and manage complex replication topologies using Tungsten.
Course Topics
- Overview of Tungsten features for complex replication configurations
- Basic installation commands
- Tungsten cookbook tools for fast setup of standard topologies
- Setting up master/slave topology
- Taking over from native MySQL replication
- Adding a node to a master/slave topology
- Switching between nodes in master/slave topology
- Setting up an all-masters topology
- Setting up star topologies
- Adding a node in a star topology
- Setting up fan-in replication from multiple masters
- Standard problems with complex topologies and how to handle them
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
Do you have the background necessary to take full advantage of Tungsten Replicator in your environments? Tungsten offers enterprise-quality replication features in an open source package hosted on Google Code. This virtual course teaches you how to set up innovative topologies that solve complex replication problems. We start with single MySQL servers running MySQL replication and show a simple path migration path to Tungsten.
Course Topics
- Checking host and MySQL prerequisites
- Downloading code from http://code.google.com/p/tungsten-replicator/
- Installation using the tungsten-installer utility
- Transaction filtering using standard filters as well as customized filters you write yourself
- Enabling and managing parallel replication
- Configuring multi-master and fan-in using multiple replication services
- Backup and restore integration
- Troubleshooting replication problems
- Logging bugs and participating in the Tungsten Replicator community
Replication is a powerful technology that takes knowledge and planning to use effectively. We give you the background that makes replication easier to set up, and allows you to take full advantage of the Tungsten Replicator benefits. Learn how to configure and use it more effectively for your projects in the cloud as well as on-premises hardware.
Tungsten University: Set Up And Manage Advanced Replication TopologiesContinuent
Do you know how to set up Tungsten Replication to handle multi-master topologies? Do you know how to replicate transactions from multiple servers into a single slave? Do you know how to replicate between Tungsten clusters? In this course we show you how to set up and manage complex replication topologies using Tungsten.
Course Topics
- Overview of Tungsten features for complex replication configurations
- Basic installation commands
- Tungsten cookbook tools for fast setup of standard topologies
- Setting up master/slave topology
- Taking over from native MySQL replication
- Adding a node to a master/slave topology
- Switching between nodes in master/slave topology
- Setting up an all-masters topology
- Setting up star topologies
- Adding a node in a star topology
- Setting up fan-in replication from multiple masters
- Standard problems with complex topologies and how to handle them
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
Do you have the background necessary to take full advantage of Tungsten Replicator in your environments? Tungsten offers enterprise-quality replication features in an open source package hosted on Google Code. This virtual course teaches you how to set up innovative topologies that solve complex replication problems. We start with single MySQL servers running MySQL replication and show a simple path migration path to Tungsten.
Course Topics
- Checking host and MySQL prerequisites
- Downloading code from http://code.google.com/p/tungsten-replicator/
- Installation using the tungsten-installer utility
- Transaction filtering using standard filters as well as customized filters you write yourself
- Enabling and managing parallel replication
- Configuring multi-master and fan-in using multiple replication services
- Backup and restore integration
- Troubleshooting replication problems
- Logging bugs and participating in the Tungsten Replicator community
Replication is a powerful technology that takes knowledge and planning to use effectively. We give you the background that makes replication easier to set up, and allows you to take full advantage of the Tungsten Replicator benefits. Learn how to configure and use it more effectively for your projects in the cloud as well as on-premises hardware.
Building Scale Free Applications with Hadoop and Cascadingcwensel
Many more applications are suitable to be built on Apache Hadoop than many developers realize.
In this presentation, we hope to give attendees enough information on how Hadoop works, how MapReduce can be leveraged to perform common and well understood data processing operations, and how the Cascading open-source project helps developers rapidly build sophisticated Hadoop applications that can be simply tested locally and executed remotely.
Presented at OSBridge June 2009.
Strata + Hadoop World 2012: Given Enough Monkeys - Some Thoughts On RandomnessCloudera, Inc.
Can a million monkeys on a million typewriters eventually recreate Shakespeare? The great minds since Aristotle have been thinking about this theorem. In 2011, Jesse Anderson randomly recreated Shakespeare using Hadoop. Here’s why you should care.
Intel's Nehalem Microarchitecture by Glenn Hintonparallellabs
Intel's Nehalem family of CPUs span from large multi-socket 32 core/64 thread systems to ultra small form factor laptops. What were some of the key tradeoffs in architecting and developing the Nehalem family of CPUs? What pipeline should it use? Should it optimize for servers? For desktops? For Laptops? There are lots of tradeoffs here. This talk will discuss some of the tradeoffs and results.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
Impala: A Modern, Open-Source SQL Engine for HadoopAll Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Ricky Saltzer
Software Engineer of Internal Tools for Cloudera
Big Data
Impala: A Modern, Open-Source SQL Engine for Hadoop
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.
Building Scale Free Applications with Hadoop and Cascadingcwensel
Many more applications are suitable to be built on Apache Hadoop than many developers realize.
In this presentation, we hope to give attendees enough information on how Hadoop works, how MapReduce can be leveraged to perform common and well understood data processing operations, and how the Cascading open-source project helps developers rapidly build sophisticated Hadoop applications that can be simply tested locally and executed remotely.
Presented at OSBridge June 2009.
Strata + Hadoop World 2012: Given Enough Monkeys - Some Thoughts On RandomnessCloudera, Inc.
Can a million monkeys on a million typewriters eventually recreate Shakespeare? The great minds since Aristotle have been thinking about this theorem. In 2011, Jesse Anderson randomly recreated Shakespeare using Hadoop. Here’s why you should care.
Intel's Nehalem Microarchitecture by Glenn Hintonparallellabs
Intel's Nehalem family of CPUs span from large multi-socket 32 core/64 thread systems to ultra small form factor laptops. What were some of the key tradeoffs in architecting and developing the Nehalem family of CPUs? What pipeline should it use? Should it optimize for servers? For desktops? For Laptops? There are lots of tradeoffs here. This talk will discuss some of the tradeoffs and results.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
Impala: A Modern, Open-Source SQL Engine for HadoopAll Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Ricky Saltzer
Software Engineer of Internal Tools for Cloudera
Big Data
Impala: A Modern, Open-Source SQL Engine for Hadoop
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.
Everything you wanted to know about Apache Tez:
-- Distributed execution framework targeted towards data-processing applications.
-- Based on expressing a computation as a dataflow graph.
-- Highly customizable to meet a broad spectrum of use cases.
-- Built on top of YARN – the resource management framework for Hadoop.
-- Open source Apache incubator project and Apache licensed.
Tez is the next generation Hadoop Query Processing framework written on top of YARN. Computation topologies in higher level languages like Pig/Hive can be naturally expressed in the new graph dataflow model exposed by Tez. Multi-stage queries can be expressed as a single Tez job resulting in lower latency for short queries and improved throughput for large scale queries. MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.
RAM.pptx it is about random access memory which is used Pcravigareja045
PPT ABOUT THE RANDOM ACCESS MEMORY . It is used in operating system it is very useful
Random-access memory (RAM; /ræm/) is a form of electronic computer memory that can be read and changed in any order, typically used to store working data and machine code.[1][2] A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media (such as hard disks, CD-RWs, DVD-RWs and the older magnetic tapes and drum memory), where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.Random-access memory (RAM; /ræm/) is a form of electronic computer memory that can be read and changed in any order, typically used to store working data and machine code.[1][2] A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media (such as hard disks, CD-RWs, DVD-RWs and the older magnetic tapes and drum memory), where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.Random-access memory (RAM; /ræm/) is a form of electronic computer memory that can be read and changed in any order, typically used to store working data and machine code.[1][2] A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media (such as hard disks, CD-RWs, DVD-RWs and the older magnetic tapes and drum memory), where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.
Random-access memory (RAM; /ræm/) is a form of electronic computer memory that can be read and changed in any order, typically used to store working data and machine code.[1][2] A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media (such as hard disks, CD-RWs, DVD-RWs and the older magnetic tapes and drum memory), where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement.
Random-access memory (RAM; /ræm/) is a form of electronic computer memory that can be read and changed in an
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
Let’s be honest: Running a distributed stateful stream processor that is able to handle terabytes of state and tens of gigabytes of data per second while being highly available and correct (in an exactly-once sense) does not work without any planning, configuration and monitoring. While the Flink developer community tries to make everything as simple as possible, it is still important to be aware of all the requirements and implications In this talk, we will provide some insights into the greatest operations mysteries of Flink from a high-level perspective: - Capacity and resource planning: Understand the theoretical limits. - Memory and CPU configuration: Distribute resources according to your needs. - Setting up High Availability: Planning for failures. - Checkpointing and State Backends: Ensure correctness and fast recovery For each of the listed topics, we will introduce the concepts of Flink and provide some best practices we have learned over the past years supporting Flink users in production.
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
Perfecto para usuarios departamentales, agencias de publicidad, empresas, colegios, universidades, etc. Las unidades de almacenamiento masivo NAS ofrece una solución de almacenamiento segura y escalable con grabador de Blu-Ray® incorporado, el cual sirve como medio de almacenamiento secundario. Al guardar los datos en un disco Blu-Ray ®, el NAS crea un catálogo del contenido de los discos para un fácil acceso y almacenamiento de la información
El LG Super Multi NAS es una solución simple y completa para su almacenamiento digital. Más información en http://www.lg.com/cl/microsite/b2b/index.jsp
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
Effectively leveraging fast networking and storage hardware (e.g., RDMA, NVMe, etc.) in Apache Spark remains challenging. Current ways to integrate the hardware at the operating system level fall short, as the hardware performance advantages are shadowed by higher layer software overheads. This session will show how to integrate RDMA and NVMe hardware in Spark in a way that allows applications to bypass both the operating system and the Java virtual machine during I/O operations. With such an approach, the hardware performance advantages become visible at the application level, and eventually translate into workload runtime improvements. Stuedi will demonstrate how to run various Spark workloads (e.g, SQL, Graph, etc.) effectively on 100Gbit/s networks and NVMe flash.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
4. The Machinery
Servers
• CPUs
• DRAM
• Disks Clusters
Racks
• 40-80 servers
• Ethernet switch
5. Architectural view of the storage hierarchy
P P P P
One server
L1$ … L1$ L1$ … L1$
DRAM: 16GB, 100ns, 20GB/s
L2$
… L2$ Disk: 2TB, 10ms, 200MB/s
Disk
Local DRAM
6. Architectural view of the storage hierarchy
P P P P
One server
L1$ … L1$ L1$ … L1$
DRAM: 16GB, 100ns, 20GB/s
L2$
… L2$ Disk: 2TB, 10ms, 200MB/s
Disk
Local DRAM
Rack Switch
Local rack (80 servers)
Disk Disk
DRAM
DRAM
DRAM
Disk
Disk … DRAM
DRAM
DRAM
Disk
Disk
DRAM: 1TB, 300us, 100MB/s
Disk Disk Disk: 160TB, 11ms, 100MB/s
DRAM DRAM
7. Architectural view of the storage hierarchy
P P P P
One server
L1$ … L1$ L1$ … L1$
DRAM: 16GB, 100ns, 20GB/s
L2$
… L2$ Disk: 2TB, 10ms, 200MB/s
Disk
Local DRAM
Rack Switch
Local rack (80 servers)
Disk Disk
DRAM
DRAM
DRAM
Disk
Disk … DRAM
DRAM
DRAM
Disk
Disk
DRAM: 1TB, 300us, 100MB/s
Disk Disk Disk: 160TB, 11ms, 100MB/s
DRAM DRAM
Cluster Switch
Cluster (30+ racks)
… DRAM: 30TB, 500us, 10MB/s
Disk: 4.80PB, 12ms, 10MB/s
8. Storage hierarchy: a different view
A bumpy ride that has been getting bumpier over time
9. Reliability & Availability
• Things will crash. Deal with it!
– Assume you could start with super reliable servers (MTBF of 30 years)
– Build computing system with 10 thousand of those
– Watch one fail per day
• Fault-tolerant software is inevitable
• Typical yearly flakiness metrics
– 1-5% of your disk drives will die
– Servers will crash at least twice (2-4% failure rate)
10. The Joys of Real Hardware
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packetloss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.
15. Understanding fault statistics matters
Can we expect faults to be independent or correlated?
Are there common failure patterns we should program around?
16. Google Cluster Environment
• Cluster is 1000s of machines, typically one or handful of configurations
• File system (GFS) + Cluster scheduling system are core services
• Typically 100s to 1000s of active jobs (some w/1 task, some w/1000s)
• mix of batch and low-latency, user-facing production jobs
scheduling
master
job 1 job 3
task task
... job 12
task
job 7 job 3
task task
job 5
task
GFS
chunk
server
scheduling
slave ... chunk
server
scheduling
slave
master
Linux Linux Chubby
Commodity HW Commodity HW lock service
Machine 1 Machine N
18. GFS Usage @ Google
• 200+ clusters
• Many clusters of 1000s of machines
• Pools of 1000s of clients
• 4+ PB Filesystems
• 40 GB/s read/write load
– (in the presence of frequent HW failures)
19. Google: Most Systems are Distributed Systems
• Distributed systems are a must:
– data, request volume or both are too large for single machine
• careful design about how to partition problems
• need high capacity systems even within a single datacenter
–multiple datacenters, all around the world
• almost all products deployed in multiple locations
–services used heavily even internally
• a web search touches 50+ separate services, 1000s
machines
20. Many Internal Services
• Simpler from a software engineering standpoint
– few dependencies, clearly specified (Protocol Buffers)
– easy to test new versions of individual services
– ability to run lots of experiments
• Development cycles largely decoupled
– lots of benefits: small teams can work independently
– easier to have many engineering offices around the world
21. Protocol Buffers
• Good protocol description language is vital
• Desired attributes:
– self-describing, multiple language support
– efficient to encode/decode (200+ MB/s), compact serialized form
Our solution: Protocol Buffers (in active use since 2000)
message SearchResult {
required int32 estimated_results = 1; // (1 is the tag number)
optional string error_message = 2;
repeated group Result = 3 {
required float score = 4;
required fixed64 docid = 5;
optional message<WebResultDetails> = 6;
…
}
};
22. Protocol Buffers (cont)
• Automatically generated language wrappers
• Graceful client and server upgrades
– systems ignore tags they don't understand, but pass the information through
(no need to upgrade intermediate servers)
• Serialization/deserialization
– high performance (200+ MB/s encode/decode)
– fairly compact (uses variable length encodings)
– format used to store data persistently (not just for RPCs)
• Also allow service specifications:
service Search {
rpc DoSearch(SearchRequest) returns (SearchResponse);
rpc DoSnippets(SnippetRequest) returns
(SnippetResponse);
rpc Ping(EmptyMessage) returns (EmptyMessage) {
{ protocol=udp; };
};
• Open source version: http://code.google.com/p/protobuf/
23. Designing Efficient Systems
Given a basic problem definition, how do you choose the "best" solution?
• Best could be simplest, highest performance, easiest to extend, etc.
Important skill: ability to estimate performance of a system design
– without actually having to build it!
24. Numbers Everyone Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns
25. Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
26. Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
Design 2: Issue reads in parallel:
10 ms/seek + 256K read / 30 MB/s = 18 ms
(Ignores variance, so really more like 30-60 ms, probably)
27. Back of the Envelope Calculations
How long to generate image results page (30 thumbnails)?
Design 1: Read serially, thumbnail 256K images on the fly
30 seeks * 10 ms/seek + 30 * 256K / 30 MB/s = 560 ms
Design 2: Issue reads in parallel:
10 ms/seek + 256K read / 30 MB/s = 18 ms
(Ignores variance, so really more like 30-60 ms, probably)
Lots of variations:
– caching (single images? whole sets of thumbnails?)
– pre-computing thumbnails
–…
Back of the envelope helps identify most promising…
28. Write Microbenchmarks!
• Great to understand performance
– Builds intuition for back-of-the-envelope calculations
• Reduces cycle time to test performance improvements
Benchmark Time(ns) CPU(ns) Iterations
BM_VarintLength32/0 2 2 291666666
BM_VarintLength32Old/0 5 5 124660869
BM_VarintLength64/0 8 8 89600000
BM_VarintLength64Old/0 25 24 42164705
BM_VarintEncode32/0 7 7 80000000
BM_VarintEncode64/0 18 16 39822222
BM_VarintEncode64Old/0 24 22 31165217
29. Know Your Basic Building Blocks
Core language libraries, basic data structures,
protocol buffers, GFS, BigTable,
indexing systems, MySQL, MapReduce, …
Not just their interfaces, but understand their
implementations (at least at a high level)
If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!
30. Encoding Your Data
• CPUs are fast, memory/bandwidth are precious, ergo…
– Variable-length encodings
– Compression
– Compact in-memory representations
• Compression/encoding very important for many systems
– inverted index posting list formats
– storage systems for persistent data
• We have lots of core libraries in this area
– Many tradeoffs: space, encoding/decoding speed, etc. E.g.:
• Zippy: encode@300 MB/s, decode@600MB/s, 2-4X compression
• gzip: encode@25MB/s, decode@200MB/s, 4-6X compression
31. Designing & Building Infrastructure
Identify common problems, and build software systems to address them
in a general way
• Important not to try to be all things to all people
– Clients might be demanding 8 different things
– Doing 6 of them is easy
– …handling 7 of them requires real thought
– …dealing with all 8 usually results in a worse system
• more complex, compromises other clients in trying to satisfy everyone
Don't build infrastructure just for its own sake
• Identify common needs and address them
• Don't imagine unlikely potential needs that aren't really there
• Best approach: use your own infrastructure (especially at first!)
– (much more rapid feedback about what works, what doesn't)
32. Design for Growth
Try to anticipate how requirements will evolve
keep likely features in mind as you design base system
Ensure your design works if scale changes by 10X or 20X
but the right solution for X often not optimal for 100X
33. Interactive Apps: Design for Low Latency
• Aim for low avg. times (happy users!)
– 90%ile and 99%ile also very important
– Think about how much data you’re shuffling around
• e.g. dozens of 1 MB RPCs per user request -> latency will be lousy
• Worry about variance!
– Redundancy or timeouts can help bring in latency tail
• Judicious use of caching can help
• Use higher priorities for interactive requests
• Parallelism helps!
34. Making Applications Robust Against Failures
Canary requests
Failover to other replicas/datacenters
Bad backend detection:
stop using for live requests until behavior gets better
More aggressive load balancing when imbalance is more severe
Make your apps do something reasonable even if not all is right
– Better to give users limited functionality than an error page
35. Add Sufficient Monitoring/Status/Debugging Hooks
All our servers:
• Export HTML-based status pages for easy diagnosis
• Export a collection of key-value pairs via a standard interface
– monitoring systems periodically collect this from running servers
• RPC subsystem collects sample of all requests, all error requests, all
requests >0.0s, >0.05s, >0.1s, >0.5s, >1s, etc.
• Support low-overhead online profiling
– cpu profiling
– memory profiling
– lock contention profiling
If your system is slow or misbehaving, can you figure out why?
36. MapReduce
• A simple programming model that applies to many large-scale
computing problems
• Hide messy details in MapReduce runtime library:
– automatic parallelization
– load balancing
– network and disk transfer optimizations
– handling of machine failures
– robustness
– improvements to core library benefit all users of library!
37. Typical problem solved by MapReduce
• Read a lot of data
• Map: extract something you care about from each record
• Shuffle and Sort
• Reduce: aggregate, summarize, filter, or transform
• Write the results
Outline stays the same,
map and reduce change to fit the problem
38. Example: Rendering Map Tiles
Input Map Shuffle Reduce Output
Emit each to all Render tile using
Geographic Sort by key
overlapping latitude- data for all enclosed Rendered tiles
feature list (key= Rect. Id)
longitude rectangles features
I-5 (0, I-5) (0, I-5)
0
Lake Washington (1, I-5) (0, Lake Wash.)
WA-520 (0, Lake Wash.) (0, WA-520)
I-90 (1, Lake Wash.) …
… (0, WA-520)
(1, I-90)
… (1, I-5)
1
(1, Lake Wash.)
(1, I-90)
…
40. Parallel MapReduce
Input
data
Map Map Map Map
Master
Shuffle Shuffle Shuffle
Reduce Reduce Reduce Partitioned
output
For large enough problems, it’s more about disk and
network performance than CPU & DRAM
41. MapReduce Usage Statistics Over Time
Aug, ‘04 Mar, ‘06 Sep, '07 Sep, ’09
Number of jobs 29K 171K 2,217K 3,467K
Average completion time (secs) 634 874 395 475
Machine years used 217 2,002 11,081 25,562
Input data read (TB) 3,288 52,254 403,152 544,130
Intermediate data (TB) 758 6,743 34,774 90,120
Output data written (TB) 193 2,970 14,018 57,520
Average worker machines 157 268 394 488
42. MapReduce in Practice
• Abstract input and output interfaces
– lots of MR operations don’t just read/write simple files
• B-tree files
• memory-mapped key-value stores
• complex inverted index file formats
• BigTable tables
• SQL databases, etc.
• ...
• Low-level MR interfaces are in terms of byte arrays
– Hardly ever use textual formats, though: slow, hard to parse
– Most input & output is in encoded Protocol Buffer format
• See “MapReduce: A Flexible Data Processing Tool” (to appear in
upcoming CACM)
43. BigTable: Motivation
• Lots of (semi-)structured data at Google
– URLs:
• Contents, crawl metadata, links, anchors, pagerank, …
– Per-user data:
• User preference settings, recent queries/search results, …
– Geographic locations:
• Physical entities (shops, restaurants, etc.), roads, satellite image data, user
annotations, …
• Scale is large
– billions of URLs, many versions/page (~20K/version)
– Hundreds of millions of users, thousands of q/sec
– 100TB+ of satellite image data
44. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
Columns
Rows
• Rows are ordered lexicographically
• Good match for most of our applications
45. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
“contents:” Columns
Rows
“www.cnn.com”
“<html>…”
• Rows are ordered lexicographically
• Good match for most of our applications
46. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
“contents:” Columns
Rows
“www.cnn.com”
“<html>…” t17
Timestamps
• Rows are ordered lexicographically
• Good match for most of our applications
47. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
“contents:” Columns
Rows
“www.cnn.com” t11
“<html>…” t17
Timestamps
• Rows are ordered lexicographically
• Good match for most of our applications
48. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) → cell contents
“contents:” Columns
Rows
t3
“www.cnn.com” t11
“<html>…” t17
Timestamps
• Rows are ordered lexicographically
• Good match for most of our applications
53. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
54. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
55. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
56. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
handles failover, monitoring
57. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
handles failover, monitoring holds tablet data, logs
58. BigTable System Structure
Bigtable Cell
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
59. BigTable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops +
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
60. BigTable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops + Open()
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
61. BigTable System Structure
Bigtable client
Bigtable Cell Bigtable client
library
Bigtable master
performs metadata ops + Open()
read/write
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
62. BigTable System Structure
Bigtable client
Bigtable Cell Bigtable client
metadata ops
library
Bigtable master
performs metadata ops + Open()
read/write
load balancing
Bigtable tablet server Bigtable tablet server … Bigtable tablet server
serves data serves data serves data
Cluster scheduling system GFS Lock service
holds metadata,
handles failover, monitoring holds tablet data, logs handles master-election
63. BigTable Status
• Design/initial implementation started beginning of 2004
• Production use or active development for 100+ projects:
– Google Print
– My Search History
– Orkut
– Crawling/indexing pipeline
– Google Maps/Google Earth
– Blogger
–…
• Currently ~500 BigTable clusters
• Largest cluster:
– 70+ PB data; sustained: 10M ops/sec; 30+ GB/s I/O
64. BigTable: What’s New Since OSDI’06?
• Lots of work on scaling
• Service clusters, managed by dedicated team
• Improved performance isolation
– fair-share scheduler within each server, better
accounting of memory used per user (caches, etc.)
– can partition servers within a cluster for different users
or tables
• Improved protection against corruption
– many small changes
– e.g. immediately read results of every compaction,
compare with CRC.
• Catches ~1 corruption/5.4 PB of data compacted
65. BigTable Replication (New Since OSDI’06)
• Configured on a per-table basis
• Typically used to replicate data to multiple bigtable
clusters in different data centers
• Eventual consistency model: writes to table in one cluster
eventually appear in all configured replicas
• Nearly all user-facing production uses of BigTable use
replication
66. BigTable Coprocessors (New Since OSDI’06)
• Arbitrary code that runs run next to each tablet in table
– as tablets split and move, coprocessor code
automatically splits/moves too
• High-level call interface for clients
– Unlike RPC, calls addressed to rows or ranges of rows
• coprocessor client library resolves to actual locations
– Calls across multiple rows automatically split into multiple
parallelized RPCs
• Very flexible model for building distributed services
– automatic scaling, load balancing, request routing for apps
67. Example Coprocessor Uses
• Scalable metadata management for Colossus (next gen
GFS-like file system)
• Distributed language model serving for machine
translation system
• Distributed query processing for full-text indexing support
• Regular expression search support for code repository
68. Current Work: Spanner
• Storage & computation system that spans all our datacenters
– single global namespace
• Names are independent of location(s) of data
• Similarities to Bigtable: tables, families, locality groups, coprocessors, ...
• Differences: hierarchical directories instead of rows, fine-grained replication
• Fine-grained ACLs, replication configuration at the per-directory level
– support mix of strong and weak consistency across datacenters
• Strong consistency implemented with Paxos across tablet replicas
• Full support for distributed transactions across directories/machines
– much more automated operation
• system automatically moves and adds replicas of data and computation based
on constraints and usage patterns
• automated allocation of resources across entire fleet of machines
69. Design Goals for Spanner
• Future scale: ~106 to 107 machines, ~1013 directories,
~1018 bytes of storage, spread at 100s to 1000s of
locations around the world, ~109 client machines
– zones of semi-autonomous control
– consistency after disconnected operation
– users specify high-level desires:
“99%ile latency for accessing this data should be <50ms”
“Store this data on at least 2 disks in EU, 2 in U.S. & 1 in Asia”
70. Adaptivity in World-Wide Systems
• Challenge: automatic, dynamic world-wide placement of
data & computation to minimize latency and/or cost, given
constraints on:
– bandwidth
– packet loss
– power
– resource usage
– failure modes
– ...
• Users specify high-level desires:
“99%ile latency for accessing this data should be <50ms”
“Store this data on at least 2 disks in EU, 2 in U.S. & 1 in Asia”
71. Building Applications on top of Weakly
Consistent Storage Systems
• Many applications need state replicated across a wide area
– For reliability and availability
• Two main choices:
– consistent operations (e.g. use Paxos)
• often imposes additional latency for common case
– inconsistent operations
• better performance/availability, but apps harder to write and reason about in
this model
• Many apps need to use a mix of both of these:
– e.g. Gmail: marking a message as read is asynchronous, sending a
message is a heavier-weight consistent operation
72. Building Applications on top of Weakly
Consistent Storage Systems
• Challenge: General model of consistency choices,
explained and codified
– ideally would have one or more “knobs” controlling performance vs.
consistency
– “knob” would provide easy-to-understand tradeoffs
• Challenge: Easy-to-use abstractions for resolving
conflicting updates to multiple versions of a piece of state
– Useful for reconciling client state with servers after disconnected
operation
– Also useful for reconciling replicated state in different data centers
after repairing a network partition
73. Thanks! Questions...?
Further reading:
• Ghemawat, Gobioff, & Leung. Google File System, SOSP 2003.
• Barroso, Dean, & Hölzle . Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, 2003.
• Dean & Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.
• Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, & Gruber. Bigtable: A Distributed
Storage System for Structured Data, OSDI 2006.
• Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. OSDI 2006.
• Pinheiro, Weber, & Barroso. Failure Trends in a Large Disk Drive Population. FAST 2007.
• Brants, Popat, Xu, Och, & Dean. Large Language Models in Machine Translation, EMNLP 2007.
• Barroso & Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale
Machines, Morgan & Claypool Synthesis Series on Computer Architecture, 2009.
• Malewicz et al. Pregel: A System for Large-Scale Graph Processing. PODC, 2009.
• Schroeder, Pinheiro, & Weber. DRAM Errors in the Wild: A Large-Scale Field Study. SEGMETRICS’09.
• Protocol Buffers. http://code.google.com/p/protobuf/
These and many more available at: http://labs.google.com/papers.html