Course : Introduction to Big Data with Apache Spark : http://ouo.io/Mqc8L5
Course : Spark Fundamentals I : http://ouo.io/eiuoV
Course : Functional Programming Principles in Scala : http://ouo.io/rh4vv
1) The document outlines the steps in peak calling and annotation from sequencing data, including mapping reads, determining coverage, identifying enriched regions compared to controls, and annotating peaks by genomic location.
2) It reviews common file formats used at different steps like FASTQ, SAM/BAM, BED, WIG, and GFF and the information they contain.
3) Popular peak calling programs are discussed and compared based on their statistical models and techniques for assigning peaks while controlling for biases from controls, duplicates, and genomic features.
Encrypted message transmitter on public networkRowshina Nikzad
To transmit messages in critical situations when the value of information and Data is high, For example in the military, we need to encrypt messages.
Different ways to encrypt messages have been invented, the subject of this project is encoding the message in the sender and decoding the message in the receiver using the Playfair method over the network platform. we used #c for programming this project.
LZ77 and LZ78 are two lossless data compression algorithms that achieve compression by replacing repeated data with references to a single copy of that data (LZ77) or a built dictionary (LZ78). LZ77 uses length-distance pairs to encode matches while LZ78 outputs dictionary indices and new characters. Both algorithms form the basis of modern compression standards like DEFLATE, and were important milestones in data compression.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence. It works by building a binary tree from the frequency of symbols, where more frequent symbols are encoded by shorter codewords. This allows for more efficient representation of frequent symbols and achieves compression close to the theoretical minimum possible given the frequencies. The algorithm and encoding/decoding process are explained step-by-step with an example.
This document summarizes Huffman code decoding. It takes a sequence to be decoded and a character probability table as input. A binary tree is built from the probabilities, with left branches representing 1 and right 0. The time complexity is O(n^2) where n is the sequence length, as building the tree can resemble an unbalanced tree. Sample run times are provided for sequences of different lengths on 10 possible characters. Pseudocode provides an algorithm to traverse the tree and decode the sequence.
1) The document outlines the steps in peak calling and annotation from sequencing data, including mapping reads, determining coverage, identifying enriched regions compared to controls, and annotating peaks by genomic location.
2) It reviews common file formats used at different steps like FASTQ, SAM/BAM, BED, WIG, and GFF and the information they contain.
3) Popular peak calling programs are discussed and compared based on their statistical models and techniques for assigning peaks while controlling for biases from controls, duplicates, and genomic features.
Encrypted message transmitter on public networkRowshina Nikzad
To transmit messages in critical situations when the value of information and Data is high, For example in the military, we need to encrypt messages.
Different ways to encrypt messages have been invented, the subject of this project is encoding the message in the sender and decoding the message in the receiver using the Playfair method over the network platform. we used #c for programming this project.
LZ77 and LZ78 are two lossless data compression algorithms that achieve compression by replacing repeated data with references to a single copy of that data (LZ77) or a built dictionary (LZ78). LZ77 uses length-distance pairs to encode matches while LZ78 outputs dictionary indices and new characters. Both algorithms form the basis of modern compression standards like DEFLATE, and were important milestones in data compression.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how the algorithm works, its advantages in achieving compression close to the source entropy, and some limitations. It also covers applications of Huffman coding like image compression.
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence. It works by building a binary tree from the frequency of symbols, where more frequent symbols are encoded by shorter codewords. This allows for more efficient representation of frequent symbols and achieves compression close to the theoretical minimum possible given the frequencies. The algorithm and encoding/decoding process are explained step-by-step with an example.
This document summarizes Huffman code decoding. It takes a sequence to be decoded and a character probability table as input. A binary tree is built from the probabilities, with left branches representing 1 and right 0. The time complexity is O(n^2) where n is the sequence length, as building the tree can resemble an unbalanced tree. Sample run times are provided for sequences of different lengths on 10 possible characters. Pseudocode provides an algorithm to traverse the tree and decode the sequence.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
This document summarizes a technique for automatically segmenting source code identifiers into meaningful words. It presents a search-based approach inspired by how developers compose identifiers using terms and applying word transformations. The approach uses a dictionary of terms, calculates the distance between an identifier and dictionary words using Dynamic Time Warping, and applies word transformation rules. An evaluation on two systems found it outperformed a simple CamelCase splitter, correctly splitting over 90% of identifiers. Future work is planned to expand the evaluation and enhance heuristics for term selection and transformations.
Cryptography for Developers provides an overview of cryptography concepts for developers. It defines cryptography as the encryption of plaintext into ciphertext and back again. It discusses symmetric and asymmetric cryptography, including examples like the Caesar cipher. It covers hashing of passwords for storage and discusses popular algorithms like MD5 and SHA-2. The document also summarizes public key cryptography techniques like RSA and references materials for further learning.
The document discusses attention mechanisms and their implementation in TensorFlow. It begins with an overview of attention mechanisms and their use in neural machine translation. It then reviews the code implementation of an attention mechanism for neural machine translation from English to French using TensorFlow. Finally, it briefly discusses pointer networks, an attention mechanism variant, and code implementation of pointer networks for solving sorting problems.
Huffman coding is a lossless data compression technique that converts fixed length codes to variable length codes. It assigns shorter codes to more frequent characters and longer codes to less frequent characters. This allows for more efficient data storage and transmission. The key steps are to create a frequency table of characters, construct a binary tree based on frequencies, and extract the Huffman codes from the tree. Huffman coding can significantly reduce file sizes by achieving better compression than fixed length codes. It is used widely in file formats like ZIP, JPEG, and MPEG.
The document describes the LZ77 data compression algorithm. It uses a sliding window approach where the text is searched for matches within a search buffer. When a match is found, it is encoded as an offset and length rather than encoding the full text. This allows duplication or repetition within the text to be replaced with smaller and more efficient offset/length codes. An example is provided to demonstrate how the text "sir sid eastman easily teases sea sick seals" would be encoded using this approach.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how Huffman coding can be applied to image compression by first predicting pixel values and then encoding the residuals. It notes some disadvantages of Huffman coding and describes variations like adaptive Huffman coding.
Birds of a feather session on Nagios Plugins New Threshold Specification Syntax.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
- LZ77 and LZ78 are two lossless data compression algorithms published by Abraham Lempel and Jacob Ziv in the 1970s that form the basis for many other compression schemes. LZ77 uses a sliding window approach to find repeated patterns in data and encode them with references to previous matches, while LZ78 constructs an explicit dictionary during compression.
- LZW compression is a table-based algorithm built upon LZ77 and LZ78 that assigns variable-length codes to input sequences and builds a translation table as part of the compressed file. It was commonly used in formats like GIF but its use was controversial due to an expired patent held by Unisys.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Huffman coding is an algorithm that uses variable-length binary codes to compress data. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. The algorithm constructs a binary tree from the frequency of symbols and extracts the Huffman codes from the tree. Huffman coding is widely used in applications like ZIP files, JPEG images, and MPEG videos to reduce file sizes for efficient transmission or storage.
The document summarizes classical encryption techniques, including:
1) Monoalphabetic ciphers which encrypt one letter to another but can be broken through frequency analysis of letters.
2) The Playfair cipher which encrypts digrams and provides more security than monoalphabetic ciphers.
3) Polyalphabetic ciphers like the Vigenère cipher which use multiple cipher alphabets to provide even stronger security.
This document provides an overview of classical encryption techniques, including symmetric encryption and cryptanalysis. It discusses the basic components of encryption (plaintext, ciphertext, cipher, key) and encryption mappings. Specifically, it examines the requirements for secure symmetric encryption using a strong algorithm and secret key known only to the sender and receiver. It also covers cryptanalytic attacks, the strength of encryption algorithms, and basic techniques like brute force search and classical substitution ciphers.
An introduction to Rust: the modern programming language to develop safe and ...Claudio Capobianco
Rust is a young programming language developed by Mozilla with the open source community support. According to a survey of StackOverflow, in 2016 was the most loved among developers language! The goal of Rust is to combine control and performances, that is, operate at low level with high-level constructs. The actual applications vary from operating system to web development. Rust natively includes tools for Agile development, such as dependency management, testing and much more. The gap with other popular languages is filling up quickly thanks to the community, very active and fantastic :)
In this introductory presentation we will discuss the characteristics that make Rust unique, including the concepts of Ownership, Borrowing, and Lifetimes.
These slide has be presented for a talk in BIC Lazio Casilina, that has been also the first meetup of Rust Rome!
This document discusses upcoming features in C# based on presentations by Christian Nagel. It covers features already implemented in C# 7.x like tuples, deconstruction, and pattern matching as well as features still in progress or being prototyped for C# 8 like records, caller expression attributes, async streams, indexes and ranges, extended patterns, and nullable reference types. The goal of new features is to make C# code safer, more efficient, give more freedom, and require less code.
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...corehard_by
This document discusses range-based text formatting and proposes replacing existing approaches with a range-based solution. It suggests representing text as ranges and using range algorithms and functions for concatenation and formatting. This would allow treating different string types uniformly and flexibly while avoiding issues with current formatting methods like iostream manipulation and format strings. The document provides examples of formatting numbers and dates as ranges and constructing containers like std::string from multiple ranges.
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...Moabi.com
This presentation given in 2011 during the first Ruxcon Monthly (Ruxmon) Sydney focuses on proprietary protocols reverse engineering and vulnerability audits.
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to symbols based on their frequencies. It involves arranging symbols by probability, assigning shorter codes like 0 and 1 to more probable symbols, and combining less probable symbols into a new symbol. This process repeats until only two symbols remain, which are then assigned codes moving backward. The algorithm aims to achieve maximum efficiency by keeping code lengths short for frequent symbols and long for rare symbols. An example encodes letters A-D with probabilities 1/2, 1/4, 1/8, 1/8 into codes with lengths 1, 2, 3, and 3, achieving maximum efficiency with zero redundancy.
This document introduces several design patterns including abstract factory, singleton, prototype, adapter, composite, and decorator patterns. It provides examples of how each pattern works and why it would be used, with accompanying PHP code samples. Design patterns are general reusable solutions to common programming problems and help show the relationship and interaction between objects.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
This document summarizes a technique for automatically segmenting source code identifiers into meaningful words. It presents a search-based approach inspired by how developers compose identifiers using terms and applying word transformations. The approach uses a dictionary of terms, calculates the distance between an identifier and dictionary words using Dynamic Time Warping, and applies word transformation rules. An evaluation on two systems found it outperformed a simple CamelCase splitter, correctly splitting over 90% of identifiers. Future work is planned to expand the evaluation and enhance heuristics for term selection and transformations.
Cryptography for Developers provides an overview of cryptography concepts for developers. It defines cryptography as the encryption of plaintext into ciphertext and back again. It discusses symmetric and asymmetric cryptography, including examples like the Caesar cipher. It covers hashing of passwords for storage and discusses popular algorithms like MD5 and SHA-2. The document also summarizes public key cryptography techniques like RSA and references materials for further learning.
The document discusses attention mechanisms and their implementation in TensorFlow. It begins with an overview of attention mechanisms and their use in neural machine translation. It then reviews the code implementation of an attention mechanism for neural machine translation from English to French using TensorFlow. Finally, it briefly discusses pointer networks, an attention mechanism variant, and code implementation of pointer networks for solving sorting problems.
Huffman coding is a lossless data compression technique that converts fixed length codes to variable length codes. It assigns shorter codes to more frequent characters and longer codes to less frequent characters. This allows for more efficient data storage and transmission. The key steps are to create a frequency table of characters, construct a binary tree based on frequencies, and extract the Huffman codes from the tree. Huffman coding can significantly reduce file sizes by achieving better compression than fixed length codes. It is used widely in file formats like ZIP, JPEG, and MPEG.
The document describes the LZ77 data compression algorithm. It uses a sliding window approach where the text is searched for matches within a search buffer. When a match is found, it is encoded as an offset and length rather than encoding the full text. This allows duplication or repetition within the text to be replaced with smaller and more efficient offset/length codes. An example is provided to demonstrate how the text "sir sid eastman easily teases sea sick seals" would be encoded using this approach.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how Huffman coding can be applied to image compression by first predicting pixel values and then encoding the residuals. It notes some disadvantages of Huffman coding and describes variations like adaptive Huffman coding.
Birds of a feather session on Nagios Plugins New Threshold Specification Syntax.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
- LZ77 and LZ78 are two lossless data compression algorithms published by Abraham Lempel and Jacob Ziv in the 1970s that form the basis for many other compression schemes. LZ77 uses a sliding window approach to find repeated patterns in data and encode them with references to previous matches, while LZ78 constructs an explicit dictionary during compression.
- LZW compression is a table-based algorithm built upon LZ77 and LZ78 that assigns variable-length codes to input sequences and builds a translation table as part of the compressed file. It was commonly used in formats like GIF but its use was controversial due to an expired patent held by Unisys.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Huffman coding is an algorithm that uses variable-length binary codes to compress data. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. The algorithm constructs a binary tree from the frequency of symbols and extracts the Huffman codes from the tree. Huffman coding is widely used in applications like ZIP files, JPEG images, and MPEG videos to reduce file sizes for efficient transmission or storage.
The document summarizes classical encryption techniques, including:
1) Monoalphabetic ciphers which encrypt one letter to another but can be broken through frequency analysis of letters.
2) The Playfair cipher which encrypts digrams and provides more security than monoalphabetic ciphers.
3) Polyalphabetic ciphers like the Vigenère cipher which use multiple cipher alphabets to provide even stronger security.
This document provides an overview of classical encryption techniques, including symmetric encryption and cryptanalysis. It discusses the basic components of encryption (plaintext, ciphertext, cipher, key) and encryption mappings. Specifically, it examines the requirements for secure symmetric encryption using a strong algorithm and secret key known only to the sender and receiver. It also covers cryptanalytic attacks, the strength of encryption algorithms, and basic techniques like brute force search and classical substitution ciphers.
An introduction to Rust: the modern programming language to develop safe and ...Claudio Capobianco
Rust is a young programming language developed by Mozilla with the open source community support. According to a survey of StackOverflow, in 2016 was the most loved among developers language! The goal of Rust is to combine control and performances, that is, operate at low level with high-level constructs. The actual applications vary from operating system to web development. Rust natively includes tools for Agile development, such as dependency management, testing and much more. The gap with other popular languages is filling up quickly thanks to the community, very active and fantastic :)
In this introductory presentation we will discuss the characteristics that make Rust unique, including the concepts of Ownership, Borrowing, and Lifetimes.
These slide has be presented for a talk in BIC Lazio Casilina, that has been also the first meetup of Rust Rome!
This document discusses upcoming features in C# based on presentations by Christian Nagel. It covers features already implemented in C# 7.x like tuples, deconstruction, and pattern matching as well as features still in progress or being prototyped for C# 8 like records, caller expression attributes, async streams, indexes and ranges, extended patterns, and nullable reference types. The goal of new features is to make C# code safer, more efficient, give more freedom, and require less code.
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...corehard_by
This document discusses range-based text formatting and proposes replacing existing approaches with a range-based solution. It suggests representing text as ranges and using range algorithms and functions for concatenation and formatting. This would allow treating different string types uniformly and flexibly while avoiding issues with current formatting methods like iostream manipulation and format strings. The document provides examples of formatting numbers and dates as ranges and constructing containers like std::string from multiple ranges.
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...Moabi.com
This presentation given in 2011 during the first Ruxcon Monthly (Ruxmon) Sydney focuses on proprietary protocols reverse engineering and vulnerability audits.
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to symbols based on their frequencies. It involves arranging symbols by probability, assigning shorter codes like 0 and 1 to more probable symbols, and combining less probable symbols into a new symbol. This process repeats until only two symbols remain, which are then assigned codes moving backward. The algorithm aims to achieve maximum efficiency by keeping code lengths short for frequent symbols and long for rare symbols. An example encodes letters A-D with probabilities 1/2, 1/4, 1/8, 1/8 into codes with lengths 1, 2, 3, and 3, achieving maximum efficiency with zero redundancy.
This document introduces several design patterns including abstract factory, singleton, prototype, adapter, composite, and decorator patterns. It provides examples of how each pattern works and why it would be used, with accompanying PHP code samples. Design patterns are general reusable solutions to common programming problems and help show the relationship and interaction between objects.
Connect.Tech- Enhancing Your Workflow With Xcode Source Editor Extensionsstable|kernel
Developers are constantly refining their workflow in order to master their craft. There is a plethora of tools available that can help bootstrap a project, increase efficiency, or simply make developers happy. Let’s explore the newly introduced Xcode Source Editor Extensions; an Application Extension that gives developers the power to create custom actions in Xcode’s Editor menu.
This document reports the results of unit root tests on several time series variables: GM1, GM2, GMB, GM1ISL, GM2ISL, GMBISL, GCPI, GCREDIT, GLIKUID, GCREDIT ISL, and GLIKUID ISL. It provides the ADF test statistic for each variable and its lags, and compares these values to critical values at the 1%, 5%, and 10% levels to determine whether the null hypothesis of a unit root can be rejected.
Purchasing power parity a unit root, cointegration and var analysis in emergi...Giwrgos Loukopoulos
The document analyzes the validity of the absolute purchasing power parity (PPP) hypothesis for 4 advanced and 4 emerging countries from 1993 to 2014. It applies unit root tests, cointegration tests, and vector autoregression (VAR) models including impulse response functions and variance decomposition. The main findings are: 1) Unit root tests show PPP may hold for some countries and methods but not others. 2) Cointegration tests do not support PPP for any country. 3) VAR models show real exchange rate shocks take 9.76-77.39 months to halve and half-life estimates vary widely by country.
A Hypervisor IPS based on Hardware Assisted Virtualization TechnologyFFRI, Inc.
This document describes Viton, a hypervisor-based intrusion prevention system (IPS) developed by Fourteenforty Research Institute. Viton runs as a hypervisor using hardware-assisted virtualization technology to monitor the guest operating system for malicious activity. It protects persistent system resources by blocking all VMX instructions, monitoring registers like IDTR and MSR, and protecting read-only code sections of the kernel from modification. Viton aims to enforce immutability of critical system structures to detect rootkits and other malware running inside the guest OS.
In this presentation it is described how to create a jQuery Modal Window using Likno Web Modal Windows Builder.
Likno Web Modal Windows Builder is a powerful application for creating any type of jQuery Modal Windows (popup boxes, dialog boxes, etc.).
Likno Web Modal Windows Builder info: http://www.likno.com/jquery-modal-windows/index.php
Examples: http://www.likno.com/jquery-builders/examples.php?p=lwmw&e=n
Download: http://www.likno.com/jquery-modal-windows/download.php
Representational State Transfer (REST) is an architectural style for distributed hypermedia systems like the World Wide Web. The key goals of REST include scalability, generality of interfaces, independent deployment of components, and use of intermediary components. REST uses a stateless, client-server architecture and relies on standard HTTP methods like GET, POST, PUT, and DELETE to allow clients to access and modify resources identified by URIs. Resources are representations that can be acted on and transferred between components. [END SUMMARY]
Creational patterns deal with object creation and aim to create objects in a suitable manner. There are three main creational patterns: the factory pattern, which provides a consistent interface for creating properly configured objects; the abstract factory pattern, which is a factory for factories and handles more complex situations; and the singleton pattern, which ensures a single instance of an object and consistency of data by restricting object instantiation.
This document discusses Cognitive Load Theory and how it relates to the usability of electronic health records (EHRs). It explains that cognitive load is the mental effort required for problem-solving and learning. Working memory is limited, so cognitive load must not exceed working memory capacity. The cognitive load imposed by EHR use comes from the intrinsic load of clinical tasks as well as the extraneous load of poor interface design. The document provides examples of how EHR interfaces can be improved by reducing extraneous load through simpler layouts, predictable structures, and workflows that match clinical thinking processes. It also discusses managing cognitive load by breaking large tasks into smaller chunks and limiting new learning to times outside of patient visits.
Can't wait for 2010? Here's a number of web design trends to prepare yourself already. Presentation for Boondoggle & Rabobank Corporate Communications.
In this presentation by Lucid Smart Pill we discuss the limitations of the working memory. Cognitive Load refers to the amount of brain power required to learn new information, solve a problem or complete a task. Reducing or minimising cognitive load will help people solve problems or perform tasks more with more ease & less strain, usually resulting in more desirable outcomes.
Presented by Amie Weller Colbert.
REST (REpresentational State Transfer) is an architectural style for building web APIs that relies on HTTP verbs like GET, POST, PUT, and DELETE to manipulate resources identified by URIs. Resources can have multiple representations like JSON or XML, and the appropriate representation is selected via content negotiation. REST aims to provide a simple, lightweight interface and takes advantage of existing web protocols to transfer representations of resources between clients and servers. Key REST constraints include using nouns for resources and HTTP methods for actions, making the interface lightweight, client-server, cacheable, and layered.
This document discusses traffic and congestion control in ATM networks. It covers key issues like congestion problems, frameworks adopted, requirements for ATM traffic and congestion control, problems with ATM congestion control, key performance issues related to latency and speed effects, and cell delay variation. It also summarizes traffic management frameworks, traffic control and congestion functions, algorithms like explicit rate feedback schemes, and enhanced proportional rate control algorithm.
Buffer overflows occur when a program writes more data to a buffer than it is configured to hold. This can overwrite adjacent memory and compromise the program. Common types of buffer overflows include stack overflows, heap overflows, and format string vulnerabilities. Buffer overflows have been exploited by major computer worms to spread, including the Morris worm in 1988 and the SQL Slammer worm in 2003. Techniques like canaries can help detect buffer overflows by placing check values between buffers and control data. Programming best practices like bounds checking and safe string functions can prevent buffer overflows.
Hardware support for efficient virtualizationLennox Wu
The document discusses hardware support for efficient virtualization. It begins by classifying virtualization techniques as full virtualization, paravirtualization, or hardware-assisted virtualization. It then covers the challenges of software-only virtualization on Intel x86 processors and describes hardware virtualization extensions like Intel VT-x and VT-d, as well as AMD-V. These extensions address issues like ring compression and address space compression. The document also discusses I/O virtualization techniques like Intel VT-c and AMD IOMMU, as well as the performance of different virtualization platforms like KVM, Xen, and VirtualBox on Linux.
This document provides an overview of Xcode and highlights some tips for using it efficiently:
- Xcode is Apple's integrated development environment for developing Mac and iOS apps. It includes tools for building, debugging, and optimizing code.
- Some tips for efficient navigation in Xcode include using keyboard shortcuts, customizing shortcuts, and navigating with the mouse.
- Code reuse can be achieved by importing one project into another to create dependencies between targets and share build settings.
- Automating tasks through scripts can help deploy apps faster by streamlining processes like removing headers and uploading builds.
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeterAlon Girmonsky
This document discusses how to conduct load testing to simulate 1 million concurrent users. It recommends thoroughly preparing the test script, using a dedicated performance lab with sufficient resources, running the full test within minutes, analyzing results like load sensitivity points, and performing many iterative tests while optimizing performance. Conducting such large-scale load testing can find issues and help improve a system's performance before a major release or event.
The document provides a summary of common mistakes made in C programming. It discusses issues like memory padding in structs, new line characters differences between Windows and Linux, binary mode in fopen(), potential crashes with strncpy(), only using memset() to initialize to zero, reading limits with fgets(), non-null terminated strings, include guards, getting thread IDs, and buffer overflows from writing outside array bounds. The purpose is to introduce common mistakes programmers make in C code and provide experiences to write better code.
This document discusses format string vulnerabilities in programming. It begins by explaining what a format string is in C and how it can be exploited if the format string is controlled by an attacker. It then provides examples of format string vulnerabilities, how to define them, and their importance. The document analyzes a specific vulnerability in the cfingerd 1.4.3 program and discusses how to prevent format string vulnerabilities through safe programming practices.
Pepe Vila - Cache and Syphilis [rooted2019]RootedCON
The document discusses cache latencies and coherence, microarchitectural attacks like Rowhammer and cache attacks, and memory deduplication vulnerabilities like Meltdown. It then provides more details on Rowhammer, describing how bit flips can be induced in memory through repeated activation of aggressor rows, and how this can be exploited. It explains the structure of a DIMM and concepts like bank, row, and row buffer. Tools for finding eviction sets on real hardware are also summarized.
Finding similar items in high dimensional spaces locality sensitive hashingDmitriy Selivanov
This document discusses using locality sensitive hashing (LSH) to efficiently find similar items in high-dimensional spaces. It describes how LSH works by first representing items as sets of shingles/n-grams, then using minhashing to map these sets to compact signatures while preserving similarity. It explains that LSH further hashes the signatures into "bands" to generate candidate pairs that are likely similar and need direct comparison. The number of bands can be tuned to tradeoff between finding most similar pairs vs few dissimilar pairs.
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Mail.ru Group
Дмитрий рассказал о методе снижения размерности многомерных данных – Locality Sensitive Hashing. На примере задачи поиска похожих текстовых документов гости был подробно разобран алгоритм Minhash.
This document provides an outline for a lecture on software security. It introduces the lecturer, Roman Oliynykov, and covers various topics related to software vulnerabilities like buffer overflows, heap overflows, integer overflows, and format string vulnerabilities. It provides examples of vulnerable code and exploits, and recommendations for writing more secure code to avoid these vulnerabilities.
This document discusses applying a generic programming approach to string management in C++. It finds that STL-based sequences like std::vector and std::basic_string, combined with STL algorithms, provide a simple and effective way to represent and manipulate strings. Common string operations like searching, modifying, and converting between types can be implemented independently of the underlying string representation using function templates and iterators. This approach is more flexible than solutions tied to a single string type.
The document discusses various topics in C programming language including strings, user-defined data types, scope rules, typedef, modularity, header files, storage classes, and the preprocessor. It provides examples and explanations of each topic. Specifically, it covers how to define and use strings, structures, and arrays in C. It also discusses passing structures by value versus reference.
The document contains summaries of code snippets and explanations of technical concepts. It discusses:
1) How a code snippet with post-increment operator i++ would output a garbage value.
2) Why a code snippet multiplying two ints and storing in a long int variable would not give the desired output.
3) Why a code snippet attempting to concatenate a character to a string would not work.
4) How to determine the maximum number of elements an array can hold based on its data type and memory model.
5) How to read data from specific memory locations using the peekb() function in C.
The document contains summaries of code snippets and explanations of technical concepts. It discusses:
1) How a code snippet with post-increment operator i++ would output a garbage value.
2) Why a code snippet multiplying two ints and storing in a long int variable would not give the desired output.
3) Why a code snippet attempting to concatenate a character to a string would not work.
4) How to determine the maximum number of elements an array can hold based on its data type and memory model.
5) How to read data from specific memory locations using the peekb() function.
I sometimes feel quite embarrassed when examining bugs in software projects. Many of these bugs inhabit the code for many years, and you just can't help wondering how the program still manages to run at all with a hundred mistakes and defects. And it does work somehow. And people do manage to use it. It holds true not only for code drawing a video game pockemon, but for math libraries too. Your guess is right - we'll speak about the math library Scilab and its analysis results in this article.
I am Moffat K. I am a C++ Programming Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from London, UK. I have been helping students with their homework for the past 6 years. I solve homework related to C++ Programming.
Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Programming Homework.
This article demonstrates capabilities of the static code analysis methodology. The readers are offered to study the samples of one hundred errors found in open-source projects in C/C++. All the errors have been found with the PVS-Studio static code analyzer.
The vulnerability allows remote code execution via a buffer overflow in the __nss_hostname_digits_dots() function of glibc versions before 2.18. The overflow occurs when this function is called by the gethostbyname*() family of functions with a specially crafted hostname argument meeting certain requirements. While serious, the impact is reduced as gethostbyname*() is obsolete, many programs add validation, and a patch was released in 2013.
Heap-based buffer overflow in the __nss_hostname_digits_dots function in glibc 2.2, and other 2.x versions before 2.18, allows context-dependent attackers to execute arbitrary code via vectors related to the (1) gethostbyname or (2) gethostbyname2 function, aka "GHOST."
The GHOST vulnerability is a serious weakness in the Linux glibc library. It allows attackers to remotely take complete control of the victim system without having any prior knowledge of system credentials. CVE-2015-0235 has been assigned to this issue.
Qualys security researchers discovered this bug and worked closely with Linux distribution vendors. And as a result of that we are releasing this advisory today as a coordinated effort, and patches for all distribution are available January 27, 2015.
The document provides an overview of the C and C++ programming languages. It discusses the history and evolution of C and C++. It describes key features of C like procedural programming, manual memory management, and lack of object orientation. It also describes features of C++ like classes, inheritance, and templates which provide object orientation. The document lists many widely used software written in C/C++ and discusses advantages like speed and compact memory usage and disadvantages like difficulty of manual memory management. It provides examples of basic C code structures and data types.
The document discusses strings in C and C++. It explains that strings are not a built-in data type in C/C++ and describes C-style strings as character arrays terminated by a null character. It also discusses C++ string classes like std::string. The document provides examples of using C-style strings and C++ strings. It describes common string functions in C++ for manipulating and comparing strings.
Learning spark ch01 - Introduction to Data Analysis with Sparkphanleson
Learning spark ch01 - Introduction to Data Analysis with Spark
References to Spark Course
Course : Introduction to Big Data with Apache Spark : http://ouo.io/Mqc8L5
Course : Spark Fundamentals I : http://ouo.io/eiuoV
Course : Functional Programming Principles in Scala : http://ouo.io/rh4vv
Firewall - Network Defense in Depth Firewallsphanleson
This document discusses key concepts related to network defense in depth. It defines common terms like firewalls, DMZs, IDS, and VPNs. It also covers techniques for packet filtering, application inspection, network address translation, and virtual private networks. The goal of defense in depth is to implement multiple layers of security and not rely on any single mechanism.
This document discusses wireless security and protocols such as WEP, WPA, and 802.11i. It describes weaknesses in WEP such as vulnerabilities in the RC4 encryption algorithm that allow attacks like dictionary attacks. It introduces WPA as an improvement over WEP that uses stronger encryption keys, protocols like TKIP that change keys dynamically, and AES encryption in 802.11i as stronger alternatives. It also discusses authentication methods like 802.1X that distribute unique keys to each user to address issues with shared keys in WEP.
Authentication in wireless - Security in Wireless Protocolsphanleson
The document discusses authentication protocols for wireless devices. It begins by describing the authentication problem and some basic client-server protocols. It then introduces the challenge-response protocol which aims to prevent replay attacks by including a random number in the response. However, this protocol is still vulnerable to man-in-the-middle and reflection attacks. The document proposes improvements like including an identifier in the hashed response to prevent message manipulation attacks. Overall, the document provides an overview of authentication challenges for wireless devices and the development of challenge-response protocols to address these issues.
HBase In Action - Chapter 04: HBase table designphanleson
HBase In Action - Chapter 04: HBase table design
Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase
HBase In Action - Chapter 10 - Operationsphanleson
HBase In Action - Chapter 10: Operations
Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase
Hbase in action - Chapter 09: Deploying HBasephanleson
Hbase in action - Chapter 09: Deploying HBase
Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase
This chapter discusses Spark Streaming and provides an overview of its key concepts. It describes the architecture and abstractions in Spark Streaming including transformations on data streams. It also covers input sources, output operations, fault tolerance mechanisms, and performance considerations for Spark Streaming applications. The chapter concludes by noting how knowledge from Spark can be applied to streaming and real-time applications.
This chapter discusses Spark SQL, which allows querying Spark data with SQL. It covers initializing Spark SQL, loading data from sources like Hive, Parquet, JSON and RDDs, caching data, writing UDFs, and performance tuning. The JDBC server allows sharing cached tables and queries between programs. SchemaRDDs returned by queries or loaded from data represent the data structure that SQL queries operate on.
Learning spark ch07 - Running on a Clusterphanleson
This chapter discusses running Spark applications on a cluster. It describes Spark's runtime architecture with a driver program and executor processes. It also covers options for deploying Spark, including the standalone cluster manager, Hadoop YARN, Apache Mesos, and Amazon EC2. The chapter provides guidance on configuring resources, packaging code, and choosing a cluster manager based on needs.
This chapter introduces advanced Spark programming features such as accumulators, broadcast variables, working on a per-partition basis, piping to external programs, and numeric RDD operations. It discusses how accumulators aggregate information across partitions, broadcast variables efficiently distribute large read-only values, and how to optimize these processes. It also covers running custom code on each partition, interfacing with other programs, and built-in numeric RDD functionality. The chapter aims to expand on core Spark concepts and functionality.
Learning spark ch05 - Loading and Saving Your Dataphanleson
The document discusses various file formats and methods for loading and saving data in Spark, including text files, JSON, CSV, SequenceFiles, object files, and Hadoop input/output formats. It provides examples of loading and saving each of these file types in Python, Scala, and Java code. The examples demonstrate how to read data from files into RDDs and DataFrames and how to write RDD data out to files in the various formats.
Learning spark ch04 - Working with Key/Value Pairsphanleson
Learning spark ch04 - Working with Key/Value Pairs
Course : Introduction to Big Data with Apache Spark : http://ouo.io/Mqc8L5
Course : Spark Fundamentals I : http://ouo.io/eiuoV
Course : Functional Programming Principles in Scala : http://ouo.io/rh4vv
Learning spark ch01 - Introduction to Data Analysis with Sparkphanleson
Learning spark ch01 - Introduction to Data Analysis with Spark
References to Spark Course
Course : Introduction to Big Data with Apache Spark : http://ouo.io/Mqc8L5
Course : Spark Fundamentals I : http://ouo.io/eiuoV
Course : Functional Programming Principles in Scala : http://ouo.io/rh4vv
XML FOR DUMMIES
The document is a chapter from the book "XML for Dummies" that introduces XML. It discusses what XML is, including that it is a markup language and is flexible for exchanging data. It also examines common uses of XML such as classifying information, enforcing rules on data, and outputting information in different ways. Additionally, it clarifies what XML is not, namely that it is not just for web pages, not a database, and not a programming language. The chapter concludes by discussing how to build an XML document using editors that facilitate markup and enforce document rules.
This document discusses the differences between HTML, XML, and XHTML. It covers how XHTML combines the structure of XML with the familiar tags of HTML. Key points include:
- HTML was designed for displaying web pages, XML for data exchange, and XHTML uses HTML tags with XML syntax.
- XML allows custom tags, separates content from presentation, and is self-describing, while HTML focuses on display.
- Converting to XHTML requires following XML syntax rules like closing all tags, using empty element syntax, proper nesting, and lowercase tags and attribute quotes.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
1.Buffer Overflows
1. Course 2: Programming Issues,
Section 1
Pascal Meunier, Ph.D., M.Sc., CISSP
Updated September 28, 2004
Developed thanks to the support of Symantec Corporation,
NSF SFS Capacity Building Program (Award Number 0113725)
and the Purdue e-Enterprise Center
Copyright (2004) Purdue Research Foundation. All rights reserved.
2. Course 2 Learning Plan
Buffer Overflows
Format String Vulnerabilities
Code Injection and Input Validation
Cross-site Scripting Vulnerabilities
Links and Race Conditions
Temporary Files and Randomness
Canonicalization and Directory Traversal
3. Learning objectives
Understand the definition of a buffer overflow
Learn the importance of buffer overflows
Know how buffer overflows happen
Know how to handle strings safely with regular "C"
functions
Learn safer ways to manipulate strings and buffers
4. "C" Programming Issues: Outline
Definition
Importance
Examples
Fundamental "C" problems
Survey of unsafe functions
Related Issues: Truncation, Character Encoding
Safe string libraries
Preventing buffer overflows without programming
Lab: Write string manipulation functions
5. Buffer Overflows
a.k.a. "Buffer Overrun"
A buffer overflow happens when a program
attempts to read or write data outside of the
memory allocated for that data
– Usually affects buffers of fixed size
Special case of memory management and input
validation
6. An Important Vulnerability Type
Most Common (over 60% of CERT advisories)
Well understood
Easy to avoid in principle
– Dont use "C" family languages, or be thorough
– Can be tricky (off-by-one errors)
– Tedious to do all the checks properly
Temptation: "I don't need to because I control this data and
I *know* that it will never be larger than this"
– Until a hacker figures out how to change it
7. Example Overflow
char B[10];
B[10] = x;
Array starts at index zero
So [10] is 11th element
One byte outside buffer was referenced
Off-by-one errors are common and can be
exploitable!
8. Other Example
function do_stuff(char * a) {
char b[100];
...
strcpy(b, a); // (dest, source)
...
}
What is the size of the string located at “a”?
Is it even a null-terminated string?
What if it was "strcpy(a, b);" instead?
– What is the size of the buffer pointed to by "a"?
9. What happens when memory outside a buffer
is accessed?
If memory doesn't exist:
– Bus error
If memory protection denies access:
– Segmentation fault
– General protection fault
If access is allowed, memory next to the buffer can
be accessed
– Heap
– Stack
– Etc...
10. Real Life Example: efingerd.c, v. 1.6.2
int get_request (int d, char
buffer[], u_short len) {
u_short i;
for (i=0; i< len; i++) {
...
}
buffer[i] = ‘0’;
return i;
}
What is the value of "i" at the end of the loop?
Which byte just got zeroed?
It's tricky even if you try to get things right...
11. Real Life Example: efingerd.c, v. 1.5
CAN-2002-0423
static char *lookup_addr(struct
in_addr in) {
static char addr[100];
struct hostent *he;
he = gethostbyaddr(...)
strcpy (addr, he->h_name);
return addr;
}
How big is he->h_name?
Who controls the results of gethostbyaddr?
How secure is DNS? Can you be tricked into
looking up a maliciously engineered value?
12. A Typical Stack Exploit
The stack contains: High Addresses
Arguments
– Parameters (arguments) to function
Return Address
– Return Address
addr[99]
– Local variables
– Anything pushed on the stack
addr[100+] overwrites
the return address Stack
grows
addr[0] typically this way
contains exploit
addr[0]
code
Return address is
chosen to point at exploit
Low Addresses
code!
13. Fundamental "C" Problems
You can't know the length of buffers just from a
pointer
– Partial solution: pass the length as a separate argument
"C" string functions aren't safe
– No guarantees that the new string will be null-terminated!
– Doing all checks completely and properly is tedious and
tricky
14. Strlen
What happens when you call strlen on an
improperly terminated string?
Strlen scans until a null character is found
– Can scan outside buffer if string is not null-terminated
– Can result in a segmentation fault or bus error
Strlen is not safe to call!
– Unless you positively know that the string is null-
terminated...
Are all the functions you use guaranteed to return a null-
terminated string?
15. Strcpy
char * strcpy(char * dst, const char
* src);
How can you use strcpy safely?
– Set the last character of src to NUL
According to the size of the buffer pointed to by src or a size
parameter passed to you
Not according to strlen(src)!
Wide char array: sizeof(src)/sizeof(src[0]) -1 is the index of
the last element
– Check that the size of the src buffer is smaller than or
equal to that of the dst buffer
– Or allocate dst to be at least equal to the size of src
16. Strncpy
char * strncpy(char * dst, const char
* src, size_t len);
"len" is maximum number of characters to copy
– What is the correct value for len?
Initial answer by most people: size of dst
– If dst is an array, sizeof(dst)
What if src is not NUL-terminated?
– Don't want to read outside of src buffer
– What is the correct value for "len" given that?
Minimum buffer size of dst and src, -1 for NUL byte
If arrays,
– MIN(sizeof(dst), sizeof(src)) - 1
17. Strncpy (Cont.)
Other issue: "dst" is NUL-terminated only if less
than "len" characters were copied!
– All calls to strncpy must be followed by a NUL-termination
operation
18. Question
What’s wrong with this?
function do_stuff(char * a) {
char b[100];
...
strncpy(b, a, strlen(a));
...
}
19. Question Answer
What’s wrong with this?
function do_stuff(char * a) {
char b[100];
...
strncpy(b, a, strlen(a));
...
}
The string pointed to by "a" could be larger than
the size of "b"!
20. Question
What’s wrong with this?
function do_stuff(char * a) {
char *b;
...
b = malloc(strlen(a)+1);
strncpy(b, a, strlen(a));
...
}
21. Question Answer
What’s wrong with this?
function do_stuff(char * a) {
char *b;
...
b = malloc(strlen(a)+1);
strncpy(b, a, strlen(a));
...
}
Are you absolutely certain that the string pointed to
by "a" is NUL-terminated?
22. Strlcpy
size_t strlcpy(char *dst, const char *src, size_t size);
Guarantees to null-terminate string pointed to by "dst"
if "size">0
The rest of the destination buffer is not zeroed as for
strncpy, so better performance is obtained
"size" can simply be size of dst (sizeof if an array)
– If all functions are guaranteed to null-terminate strings, then it
is safe to assume src is null-terminated
– Not safe if src is not null-terminated!
See http://www.courtesan.com/todd/papers/strlcpy.html for
benchmarks and more info
– Used in MacOS X, OpenBSD and more (but not Linux)
23. Note on Strlcpy
As the remainder of the buffer is not zeroed, there
could be information leakage
24. Corrected Efinger.c (v.1.6)
sizeof is your friend, when you can use it (if an
array)
static char addr[100];
he = gethostbyaddr(...);
if (he == NULL)
strncpy(addr, inet_ntoa(in),
sizeof(addr));
else
strncpy(addr, he->h_name,
sizeof(addr));
What is still wrong?
25. Corrected Efinger.c (v.1.6)
Notice that the last byte of addr is not zeroed, so
this code can produce non-NUL-terminated strings!
static char addr[100];
he = gethostbyaddr(...);
if (he == NULL)
strncpy(addr, inet_ntoa(in),
sizeof(addr));
else
strncpy(addr, he->h_name,
sizeof(addr));
26. Strcat
char * strcat(char * s, const char * append);
String pointed to by "append" is added at the end
of the string contained in buffer "s"
No check for size!
– Need to do all checks beforehand
– Example with arrays:
if (sizeof(s)-strlen(s)-1 >= strlen(append))
strcat(s, append);
Need to trust that "s" and "append" are NUL-
terminated
– Or set their last byte to NUL before the checks and call
27. Strncat
char * strncat(char * s, const char * append, size_t
count);
No more than "count" characters are added, and
then a NUL is added
Correct call is complex:
– strncat(s, append, sizeof(s)-strlen(s)-1)
Not a great improvement on strcat, because you still need
to calculate correctly the count
– And then figure out if the string was truncated
Need to trust that "s" and "append" are NUL-
terminated
– Or set their last byte to NUL before the checks and call
28. Strlcat
size_t strlcat(char *dst, const char *src, size_t size);
Call semantics are simple:
– Strlcat(dst, src, dst_len);
– If an array:
strlcat(dst, src, sizeof(dst));
Safety: safe even if dst is not properly terminated
– Won't read more than size characters from dst when
looking for the append location
Not safe if src is not properly terminated!
– If dst is large and the buffer for src is small, then it could
cause a segmentation fault or bus error, or copy
confidential values
29. Issues with Truncating Strings
Subsequent operations may fail or open up
vulnerabilities
– If string is a path, then it may not refer to the same thing,
or be an invalid path
Truncation means you weren't able to do what you
wanted
– You should handle that error instead of letting it go silently
30. Truncation Detection
Truncation detection was simplified by strlcpy and
strlcat, by changing the return value
– The returned value is the size of what would have been
copied if the destination had an infinite size
if this is larger than the destination size, truncation occurred
Source still needs to be NUL-terminated
Inspired by snprintf and vsprintf, which do the same
However, it still takes some consideration to make
sure the test is correct:
– if (strlcpy(dest, src, sizeof(dest)) >=
sizeof(dest)) goto toolong;
31. Multi-Byte Character Encodings
Handling of strings using variable-width encodings
or multi-byte encodings is a problem
– e.g., UTF-8 is 1-4 bytes long
How long is the string?
– In bytes
– In characters
Overflows are possible if size checks do not
properly account for character encoding!
.NET: System.String supports UTF-16
– Strings are immutable - no overflow possible there!
32. Safestr
Free library available at: http://zork.org
Features:
– Works on UNIX and Windows
– Buffer overflow protection
– String format protection
– "Taint-tracking" (à la Perl's taint mode)
Limitations and differences:
– Does not handle multi-byte characters
– License: binaries must reproduce a copyright notice
– NUL characters have no special meaning
– Must use their library functions all the time (but
conversion to regular "C" strings is easy)
33. Microsoft Strsafe
Null-termination guaranteed
Option for using either number of characters or
bytes (for Unicode character encoding), and
disallowing the other
Option to treat truncation as a fatal error
Define behavior upon error
– Output buffer set to "" or filled
Option to prevent information leaks
– Pad rest of buffer
However, correct calculations still needed
– e.g., wcsncat requires calculating the remaining space in
the destination string...
34. Future Microsoft
Visual Studio 2005 will have a new series of safe
string manipulation functions
– strcpy_s()
– strncpy_s()
– strncat_s()
– strlen_s()
– etc...
http://msdn.microsoft.com/library/default.asp?url=/li
brary/en-us/dncode/html/secure05202002.asp
Visual Studio 2005 (as of Beta 1) by default issues
deprecation warnings on strcpy, strncpy, etc… Say
goodbye to your old friends, they're too dangerous!
35. Other Unsafe Functions: sprintf family
int sprintf(char *s, const char *format, /* args*/ ...);
– Buffer "s" can be overflowed
int snprintf(char *s, size_t n, const char *format,
/* args*/ ...);
– Does not guarantee NUL-termination of s on some
platforms (Microsoft, Sun)
– MacOS X: NUL-termination guaranteed
– Which is it on the class server? Check with "man
sprintf"
int vsprintf(char * str, const char * format, va_list
ap);
– Buffer "str" can be overflowed
36. Gets, fgets
char * gets(char *str);
– Buffer "str" can be overflowed
char * fgets(char * str, int size, FILE * stream);
– Buffer "str" is not NUL-terminated if an I/O error occurs
– If an error occurs, returns NULL
– If end-of-file occurs before any characters are read,
returns NULL also (and buffer is unchanged)
– Callers must use feof(3) and ferror(3) to determine which
occurred.
37. Conclusion
Buffer sizes should be passed as a parameter with
every pointer
– Applies to other buffer manipulations besides strings
Need simple truncation detection
38. Preventing Buffer Overflows Without
Programming
Idea: make the heap and stack non-executable
– Because many buffer overflow attacks aim at executing
code in the data that overflowed the buffer
Doesn't prevent "return into libc" overflow attacks
– Because the return address of the function on the stack
points to a standard "C" function (e.g., "system"), this
attack doesn't execute code on the stack
e.g., ExecShield for Fedora Linux (used to be
RedHat Linux)
39. Canaries on a Stack
Add a few bytes containing special values between
variables on the stack and the return address.
Before the function returns, check that the values
are intact.
– If not, there's been a buffer overflow!
Terminate program
If the goal was a Denial-of-Service then it still
happens
– At least the machine is not compromised
If the canary can be read by an attacker, then a
buffer overflow exploit can be made to rewrite them
– e.g., see string format vulnerabilities
40. Canary Implementations
StackGuard
Stack-Smashing Protector (SSP)
– Formerly ProPolice
– gcc modification
– Used in OpenBSD
– http://www.trl.ibm.com/projects/security/ssp/
Windows: /GS option for Visual C++ .NET
These can be useful when testing too!
41. Protection Using Virtual Memory Pages
Page: A chunk (unit) of virtual memory
POSIX systems have three permissions for each
page.
– PROT_READ
– PROT_WRITE
– PROT_EXEC
Idea: manipulate and enforce these permissions
correctly to defend against buffer overflows
– Make injected code non-executable
42. OpenBSD
PROT_* purity (correct enforcement of permissions
for pages in VM system; i386 and PowerPC
architectures limit this)
W^X (write permission is exclusive of execute)
– Data shouldn't need to be executed!
– Limits also self-modifying (machine) code
.rodata (read-only data is enforced to be read-only
on a separate page)
43. Windows Execution Protection
"NX" (No Execute)
Windows XP service pack 2 feature
– Somewhat similar to POSIX permissions
Requires processor support
– AMD64
– Intel Itanium
44. Question
A buffer overflow is not exploitable if:
a) It happens in the heap
b) It happens in the stack
c) It happens where memory is not executable
d) None of the above
45. Question
A buffer overflow is not exploitable if:
a) It happens in the heap
b) It happens in the stack
c) It happens where memory is not executable
d) None of the above
46. Question
Making some memory locations non-executable
protects against which variants of buffer overflow
attacks?
a) Code in the overwritten data
b) Return into libc
c) Variable corruption
47. Question
Making some memory locations non-executable
protects against which variants of buffer overflow
attacks?
a) Code in the overwritten data
b) Return into libc
c) Variable corruption
48. Buffer Overflow Lab
Create your own safe version of the strlen, strcpy,
strcat
– Name them mystrlen, mystrcpy and mystrcat
– Pass buffer sizes for each pointer argument
– Return 0 if successful, and 1 if truncation occurred
Other error codes if you wish
– Make your implementation pass all test cases
int mystrlen(const char *s, size_t s_len);
– In this case, return the string length, not zero or one.
int mystrcpy(char * dst, const char * src,
size_t dst_len, size_t src_len);
int mystrcat(char * s, const char * append,
size_t s_len, size_t a_len);
49. Things to Ponder
What about 0 as source size? Error or not?
What if “s” is NULL?
What about overlapping buffers? Undefined everytime,
or only in certain cases?
What if reach the end in mystrlen?
How efficient to make it -- how many passes at source
string are made?
What to check first?
Reuse mystrlen within mystrcpy or mystrcat?
Compare your implementations to strl*, strsafe, safestr,
str*_s.
51. About These Slides
You are free to copy, distribute, display, and perform the work; and
to make derivative works, under the following conditions.
– You must give the original author and other contributors credit
– The work will be used for personal or non-commercial educational uses
only, and not for commercial activities and purposes
– For any reuse or distribution, you must make clear to others the terms of
use for this work
– Derivative works must retain and be subject to the same conditions, and
contain a note identifying the new contributor(s) and date of modification
– For other uses please contact the Purdue Office of Technology
Commercialization.
Developed thanks to the support of Symantec
Corporation