The document discusses three tips for optimizing C++ code: measure performance to guide optimizations, reduce computational strength by using simpler operations like comparisons over divisions, and minimize writes to arrays which can disable optimizations. Introducing extra passes or copying data can improve performance by reducing writes and enabling compiler optimizations.
The document discusses three optimization tips for C++ code: reduce strength by using fewer operations like division and instead using comparisons and additions; minimize writes to arrays which can disable optimizations; and reformulate algorithms to make an extra pass to compute values instead of writing to arrays. It provides examples of digit counting functions optimized using these tips by reducing divisions, writing to arrays backwards, and making an extra pass to find the length.
The document discusses three optimization tips for C++ code: reduce strength by using fewer operations like division and instead using comparisons and additions; minimize writes to arrays which can disable optimizations; and reformulate algorithms to make an extra pass to compute values instead of writing to arrays. It provides examples of digit counting functions optimized using these tips by reducing divisions, writing to arrays backwards, and making an extra pass to find the length.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
This document discusses building a model serving pipeline. It covers conceptualizing the pipeline, containerizing models, serving models via APIs, managing workflows such as retraining, and designing architectures. An example deep knowledge tracing model is presented that predicts question answers based on user interaction data. The pipeline uses BentoML to package models, Docker for containers, Airflow for scheduling, and Docker Swarm for container management.
1. DPDK achieves high throughput packet processing on commodity hardware by reducing kernel overhead through techniques like polling, huge pages, and userspace drivers.
2. In Linux, packet processing involves expensive operations like system calls, interrupts, and data copying between kernel and userspace. DPDK avoids these by doing all packet processing in userspace.
3. DPDK uses techniques like isolating cores for packet I/O threads, lockless ring buffers, and NUMA awareness to further optimize performance. It can achieve throughput of over 14 million packets per second on 10GbE interfaces.
This document provides recommendations to improve performance of Red Hat Global File System (GFS) and GFS2 file systems. It discusses common problems that can cause performance issues or processes hanging, such as file/directory and resource group contention. The document also provides tips on how to determine the root cause of a problem, such as identifying contended inodes or tasks. Designing the file system layout and using appropriate mount options, block sizes, and I/O patterns can optimize performance.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
The document discusses various data structures and functions related to network packet processing in the Linux kernel socket layer. It describes the sk_buff structure that is used to pass packets between layers. It also explains the net_device structure that represents a network interface in the kernel. When a packet is received, the interrupt handler will raise a soft IRQ for processing. The packet will then traverse various protocol layers like IP and TCP to be eventually delivered to a socket and read by a userspace application.
This document discusses using the D programming language for an event sourcing system at a financial firm. It summarizes key requirements, how D addresses them through features like fast iterations, built-in unit testing and a good standard library. It describes using an event sourcing architecture with a state machine approach and implementing streams as circular arrays. It also discusses using D for market data consumption and electronic trading systems through optimized data structures and concurrency features.
The document discusses Sociomantic Labs' process for porting their codebase from D1 to D2. It describes the challenges of porting large codebases while maintaining daily development. Their process involves 3 stages - first ensuring the code compiles with D1 and fixes, then porting to D2 and fixing issues, and finally adding regression testing. It also discusses specific challenges around const correctness, templates, and runtime differences. The document provides examples of helpers and wrappers developed to facilitate the porting process.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
This document discusses building a model serving pipeline. It covers conceptualizing the pipeline, containerizing models, serving models via APIs, managing workflows such as retraining, and designing architectures. An example deep knowledge tracing model is presented that predicts question answers based on user interaction data. The pipeline uses BentoML to package models, Docker for containers, Airflow for scheduling, and Docker Swarm for container management.
1. DPDK achieves high throughput packet processing on commodity hardware by reducing kernel overhead through techniques like polling, huge pages, and userspace drivers.
2. In Linux, packet processing involves expensive operations like system calls, interrupts, and data copying between kernel and userspace. DPDK avoids these by doing all packet processing in userspace.
3. DPDK uses techniques like isolating cores for packet I/O threads, lockless ring buffers, and NUMA awareness to further optimize performance. It can achieve throughput of over 14 million packets per second on 10GbE interfaces.
This document provides recommendations to improve performance of Red Hat Global File System (GFS) and GFS2 file systems. It discusses common problems that can cause performance issues or processes hanging, such as file/directory and resource group contention. The document also provides tips on how to determine the root cause of a problem, such as identifying contended inodes or tasks. Designing the file system layout and using appropriate mount options, block sizes, and I/O patterns can optimize performance.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
The document discusses various data structures and functions related to network packet processing in the Linux kernel socket layer. It describes the sk_buff structure that is used to pass packets between layers. It also explains the net_device structure that represents a network interface in the kernel. When a packet is received, the interrupt handler will raise a soft IRQ for processing. The packet will then traverse various protocol layers like IP and TCP to be eventually delivered to a socket and read by a userspace application.
This document discusses using the D programming language for an event sourcing system at a financial firm. It summarizes key requirements, how D addresses them through features like fast iterations, built-in unit testing and a good standard library. It describes using an event sourcing architecture with a state machine approach and implementing streams as circular arrays. It also discusses using D for market data consumption and electronic trading systems through optimized data structures and concurrency features.
The document discusses Sociomantic Labs' process for porting their codebase from D1 to D2. It describes the challenges of porting large codebases while maintaining daily development. Their process involves 3 stages - first ensuring the code compiles with D1 and fixes, then porting to D2 and fixing issues, and finally adding regression testing. It also discusses specific challenges around const correctness, templates, and runtime differences. The document provides examples of helpers and wrappers developed to facilitate the porting process.
Sometimes you see code that is perfectly OK according to the definition of the language, but which is flawed because it breaks too many established idioms and conventions. On the other hand, a solid piece of code is something that looks like it is written by an experienced person who cares about professionalism in programming.
A presentation at Norwegian Developer Conference 2010
What every C++ programmer should know about modern compilers (w/o comments, A...Sławomir Zborowski
The document discusses modern C++ compilers and what every C++ programmer should know about them. It covers compiler architecture, inputs and targets, the C++ standard versus compiler realities, undefined behavior, optimizations, outsmarting compilers, the compiler ecosystem, and tools for further optimization like sanitizers, clang tools, Templight, and Stoke.
The document discusses various techniques for optimizing memory usage and cache performance in code. It begins by justifying the need for memory optimization given trends in CPU and memory speeds. It then provides an overview of memory hierarchies and caches. The rest of the document discusses specific techniques for optimizing data structures, prefetching, layout of data in memory, reducing aliasing, and other strategies to improve cache utilization and performance.
The document describes how memory layout affects program performance evaluation and how STABILIZER addresses this issue. STABILIZER eliminates the effect of memory layout on performance so that the true effect of a code change can be measured in isolation, enabling statistically sound performance evaluation. It discusses how traditional methods of repeated runs and error bars are not sufficient and presents case studies showing how STABILIZER enables evaluating the performance of LLVM optimizations accurately.
The document is a presentation about optimizing algorithms using sentinel values. It discusses using sentinels to optimize linear searches, partition algorithms, and other algorithms. Sentinels allow modifying algorithms to avoid indexing and boundary checks, improving performance by up to 30% in some cases. The presentation emphasizes that introspection techniques like static if are needed to implement sentinels generically.
The document discusses generic programming in D and related concepts. It defines generic programming as finding the most general algorithm representation with the narrowest requirements and widest guarantees. This allows algorithms to be reused for different data types. It provides examples of implementing generic minimum and argminimum functions in D. The document also discusses how generic programming relates to generative programming, with code generation at compile-time using D's capabilities for compile-time evaluation and code injection via mixins.
This document discusses four cool things about the D programming language: turtles all the way down scoping, a safe subset for memory safety, an approach to purity with pure functions, and generative programming using mixin for code generation. The speaker will cover each of these topics in the presentation.
This document discusses the timeline of iOS security from 2012-2013. It begins with Stefan Esser's talk at CanSecWest 2012 about iOS 5 exploitation. It then discusses the release of iOS 6 in September 2012 and the subsequent jailbreak within a day. The document outlines various talks and research related to analyzing the security of iOS 6. It concludes by discussing new security features introduced in iOS 6 like KASLR and kernel stack and heap cookies.
Informatica 1_1_.pdf;filename_= utf-8''informatica_(1)[1]CARLOS VIERA
Computers and technology have become an integral part of modern society. They are used for communication, education, business, and entertainment. As technology continues to advance, computers will likely play an even greater role in our daily lives.
Muslim rule lect_4.ppt_filename_= utf-8''muslim rule lect 4khair20
The Mughal Empire reached its height under Shah Jahan and Aurangzeb, but Aurangzeb's harsh religious persecution of Hindus and Sikhs led to many rebellions. After his death, civil war broke out over rival claims to the throne, weakening the empire. Invaders then poured in from the north as Mughal power and territory declined.
The document discusses a code snippet in C that is used to test a candidate's understanding of C programming. The snippet declares and initializes an integer variable and prints its value. A candidate with a basic understanding would note that the code is missing #include <stdio.h> and a return statement. A candidate demonstrating a deeper understanding would provide more details, such as how C allows implicit function declarations that can cause undefined behavior, differences between C and C++ compilers, and standard conformance issues. The document suggests using this snippet to differentiate candidates or engineers based on their depth of knowledge of C.
C++11 introduced several new features for functions and lambdas including:
1. Lambda expressions that allow the definition of anonymous inline functions.
2. The std::function wrapper that allows functions and lambdas to be used interchangeably.
3. std::bind that binds arguments to function parameters for creating function objects.
These features improved support for functional programming patterns in C++.
The document introduces some basic C++ idioms, rules, guidelines and best practices. The author notes that some items are based on their personal style preferences and the styles of groups they work with. The intention is to use the presentation to spark discussion, as it includes some controversial issues. Readers are encouraged to provide criticism of the example code presented.
Let's turn the table. Suppose your goal is to deliberately create buggy programs in C and C++ with serious security vulnerabilities that can be "easily" exploited. Then you need to know about things like stack smashing, shellcode, arc injection, return-oriented programming. You also need to know about annoying protection mechanisms such as address space layout randomization, stack canaries, data execution prevention, and more. These slides will teach you the basics of how to deliberately write insecure programs in C and C++.
A PDF version of the slides can be downloaded from my homepage: http://olvemaudal.com/talks
Here is a video recording of me presenting these slides at NDC 2014: http://vimeo.com/channels/ndc2014/97505677
Enjoy!
This document discusses various techniques for code optimization at the compiler level. It begins by defining code optimization and explaining that it aims to make a program more efficient by reducing resources like time and memory usage. Several common optimization techniques are then described, including common subexpression elimination, dead code elimination, and loop optimization. Common subexpression elimination removes redundant computations. Dead code elimination removes code that does not affect program output. Loop optimization techniques like removing loop invariants and induction variables can improve loop performance. The document provides examples to illustrate how each technique works.
The document discusses code generation in compilers. It describes the main tasks of the code generator as instruction selection, register allocation and assignment, and instruction ordering. It then discusses various issues in designing a code generator such as the input and output formats, memory management, different instruction selection and register allocation approaches, and choice of evaluation order. The target machine used is a hypothetical machine with general purpose registers, different addressing modes, and fixed instruction costs. Examples of instruction selection and utilization of addressing modes are provided.
The document discusses data-oriented design principles for game engine development in C++. It emphasizes understanding how data is represented and used to solve problems, rather than focusing on writing code. It provides examples of how restructuring code to better utilize data locality and cache lines can significantly improve performance by reducing cache misses. Booleans packed into structures are identified as having extremely low information density, wasting cache space.
The document discusses key aspects of query processing in distributed database management systems (DDBMS). It covers query decomposition and localization, distributed query optimization objectives like minimizing costs, and issues around optimization techniques, granularity, timing, statistics, decision sites, and network topology. The goal is to optimize query execution plans for distributed queries across multiple sites.
It has been said that one should code as if the person maintaining the code is a violent psychopath who knows where you live. But why do we work with psychopaths? That question unfortunately cannot be answered in this presentation. However, we can shed some light on how to code for readability hopefully avoiding the problem altogether.
Readable code is about a lot more than producing beautiful code. In fact, it has nothing really to do with the beauty of the code and everything to do with the ability to quickly understand what the code does.
In this presentation we will discuss why readable code is so important. We will cover six commonly reoccurring patterns that made code hard to read, why they occur, and how they can easily be avoided:
* Deep Nesting
* Unnecessary Generalization
* Ambiguous Naming
* Hard to Follow Flow of Execution
* Code Style vs. Individualism
* Code Comments
These concepts may be applied to any programming language.
The document discusses query processing in distributed database management systems (DDBMS). It covers topics such as query decomposition, localization, and optimization where the goal is to minimize costs like I/O, CPU usage, and network communication. Effective query optimization requires considering factors like operation costs, statistics on relation sizes, network topology, and whether to use centralized or distributed decision making.
How To Handle Your Tech Debt Better - Sean MoirMike Harris
Technical Debt, Legacy Code, Legacy Systems … whatever you want to call it, is a common problem. It needs to be surfaced to the people whose lives it affects the most: the users; stakeholders; purse holders; and any other recipients of a poor experience. Through awareness of these issues, these stakeholders can understand and endorse a need to change.
Human made systems which once met a need and now cannot change to meet our emerging needs become like a tail wagging a dog. Who is in control here? The system? We humans created these systems - we ought to be able to change them. Note: this doesn’t just apply to technology.
In this session, Sean introduces a simple yet effective way to identify and prioritise what to work on, in a way which makes most sense for stakeholders and custodians.
- Given by Sean Moir for Ox:Agile Conference 2019.
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Anne Nicolas
The traditional KISS principle says that you are stupid if you can’t keep it simple. However, keeping it simple is actually very, very hard. But my lasting impression after reading a lot of code (linux kernel and otherwise) over the years is that there is no excuse for not keeping your code short. And usually, keeping it short is a very good first step towards keeping it simple. This presentation will give some simple tricks and pointers to keep your code short and I will also give some guidelines how to do design and implementation from a high-level point of view. These simple rules should make it easier for you to get your code accepted in open source projects such as the linux kernel.
Hans Verkuil
- The document discusses serialization and deserialization of objects for transfer between systems. It compares JSON and optimized JSON formats.
- JSON is more human-readable but has greater memory overhead and reduced compressibility compared to optimized formats like protocol buffers which can improve performance.
- The document recommends designing data transfer objects (DTOs) to optimize for smaller size and better compression when communicating with servers.
4 colin walls - self-testing in embedded systemsIevgenii Katsan
This document discusses self-testing in embedded systems to detect and handle failures. It describes various types of failures that can occur in CPUs, peripherals, memory, and software. It provides examples of tests that can be run during startup and continuously in the background to check for issues like stuck bits. Upon detecting failures, the system can take actions like resetting, flashing LEDs to indicate status, or sending alerts over a network. The key is to anticipate possible failures and have mechanisms to recover from them.
This presentation discusses optimizing browser DOM rendering. It begins by explaining that performance has both subjective and objective aspects. The goal is to understand the browser internals in order to know why certain optimizations work. The presentation then covers how a browser loads and parses a page, builds the DOM tree and render tree, applies CSS styles, and performs layout and painting. It provides examples of optimizations like using element sizes, visibility over display none, batching DOM changes, and loading CSS asynchronously. The overall message is to do less work such as avoiding unnecessary reflows and repaints.
Disaster Recovery with MySQL and TungstenJeff Mace
This document discusses using Tungsten to improve disaster recovery with MySQL databases. Key points include reducing replication lag between master and slave databases, enabling automatic failover if a database crashes, and hosting sites from multiple locations with copies of data everywhere. Tungsten can replicate databases in parallel to reduce lag, trigger automatic failovers, and build global replication networks across sites.
This document summarizes three stories from a MongoDB presentation about lessons learned from real-world deployments. The first story describes how a system using random updates across many entities was improved by vertically scaling the database instead of horizontally scaling. The second story explains how insufficient testing of backup processes under load led to an outage for a game launch. The third story outlines how changing a product catalog schema from embedded documents to normalized collections improved performance and resource usage.
This document discusses various places where CPU cycles are wasted in production systems based on insights from continuous profiling of large compute clusters. Some examples of wasted cycles include misconfiguration of the underlying OS, suboptimal choices of dependencies, and excessive serialization/deserialization. Specific issues highlighted include gettimeofday calls on older AWS instances, kubelet directory walking, exception throwing in xgboost, and high CPU usage from zlib. Continuous fleet-wide profiling is becoming an important tool for identifying such performance issues.
The document discusses several object-oriented design principles including DRY (Don't Repeat Yourself), SLAP (Single Layer of Abstraction Principle), and OCP (Open-Closed Principle). It provides examples of code that violates these principles by duplicating logic, having methods that operate at different abstraction levels, and being closed for extension. The examples demonstrate how to refactor the code to follow the principles through techniques like extracting duplicate logic, breaking methods into smaller focused pieces, and designing classes to be open for extension without modifying existing code.
This document provides an introduction and overview of Cassandra including:
- Cassandra's history as a NoSQL database created at Facebook and open sourced in 2008.
- Key features of Cassandra including linear scalability, continuous availability, ability to span multiple data centers, and operational simplicity.
- A high-level overview of Cassandra's architecture including its use of Dynamo and BigTable papers for the cluster and data storage layers.
- Concepts related to Cassandra's data model including data distribution, token ranges, replication, write path, and "last write wins" consistency.
This document discusses the principles of clean code based on Robert C. Martin's book "Clean Code". It covers topics like why clean code is important, guidelines for writing clean code like keeping methods small and using descriptive names, design principles like DRY and YAGNI, and code smells to avoid like long methods and duplicated code. The overall goal is to create code that is easy to read, understand and maintain over time.
Pragmatic Performance from NDC Oslo 2019David Wengier
This document contains the transcript from a presentation by David Wengier from Microsoft on optimization and performance. It discusses premature optimization, profiling code to measure performance, and that best practices often depend on the specific situation rather than always applying general rules. Key quotes are attributed to Donald Knuth regarding optimizing and a message that recommendations should not be followed blindly without considering the context.
The goal of this presentation is to help novice metadevelopers with development of domain specific languages (DSLs). To this end, a number of paradigms, techniques and guidelines are introduced.
This presentation was given during DSML Design Workshop at the PPL2010 conference that took place on November 17 & 18 at Océ R&D, Venlo, NL.
This document outlines an agenda for a CTO summit on machine learning and deep learning topics. It includes discussions on CNN and RNN architectures, word embeddings, entity embeddings, reinforcement learning, and tips for training deep neural networks. Specific applications mentioned include self-driving cars, image captioning, language modeling, and modeling store sales. It also includes summaries of papers and links to code examples.
This document discusses best practices for running Cassandra on Amazon EC2. It recommends instance sizes like m1.xlarge for most use cases. It emphasizes configuring data and commit logs on ephemeral drives for better performance than EBS volumes. It also stresses the importance of distributing nodes across availability zones and regions for high availability. Overall, the document provides guidance on optimizing Cassandra deployments on EC2 through choices of hardware, data storage, networking and operational practices.