Constraint programming techniques were applied to compiler optimization problems like instruction selection, instruction scheduling, and register allocation. The techniques were able to find optimal solutions to some problems that were previously only solved heuristically. Constraint models were improved over time by adding implied constraints, dominance constraints, and preprocessing. Solvers were improved through techniques like restarts, portfolios, and machine learning of heuristics. The approach led to identifying and solving interesting subproblems with general applicability, like improved consistency algorithms for global constraints.
MLPerf an industry standard benchmark suite for machine learning performancejemin lee
MLPerf is an industry standard benchmark suite for measuring machine learning performance. It was created in 2018 to combine the best aspects of prior benchmark efforts and has the support of major tech companies and universities. MLPerf defines benchmarks for both training and inference and provides guidelines for fair comparisons, including rules around hyperparameters, model definitions, and variance. The goal is to drive development of specialized hardware and software through objective performance evaluations.
Deterministic Galois: On-demand, Portable and ParameterlessDonald Nguyen
Non-determinism in program execution can make program development and debugging difficult. In this paper, we argue that solutions to this problem should be on-demand, portable and parameterless. On-demand means that the programming model should permit the writing of non-deterministic pro- grams since these programs often perform better than deterministic programs for the same problem. Portable means that the program should produce the same answer even if it is run on different machines. Parameterless means that if there are machine-dependent scheduling parameters that must be tuned for good performance, they must not affect the output.
Although many solutions for deterministic program execution have been proposed in the literature, they fall short along one or more of these dimensions. To remedy this, we propose a new approach, based on the Galois programming model, in which (i) the programming model permits the writing of non-deterministic programs and (ii) the runtime system executes these programs deterministically if needed. Evaluation of this approach on a collection of benchmarks from the PARSEC, PBBS, and Lonestar suites shows that it delivers deterministic execution with substantially less overhead than other systems in the literature.
The document describes a workshop on Universal Verification Methodology (UVM) that will cover UVM concepts and techniques for verifying blocks, IP, SOCs, and systems. The workshop agenda includes presentations on UVM concepts and architecture, sequences and phasing, TLM2 and register packages, and putting together UVM testbenches. The workshop is organized by Dennis Brophy, Stan Krolikoski, and Yatin Trivedi and will take place on June 5, 2011 in San Diego, CA.
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
Talk given at the 8th ACM SIGPLAN Int'l Conf. on Software Language Engineering (SLE 2015), Pittsburgh, PA, USA on October 27, 2015. Preprint available at https://hal.inria.fr/hal-01182517
This document discusses performance analysis and parallel computing. It defines performance metrics like speedup, efficiency, and scalability that are used to evaluate parallel programs. Sources of parallel overhead like synchronization, load imbalance, and communication are described. The document also discusses benchmarks used to evaluate parallel systems like PARSEC and Rodinia. It emphasizes that overall execution time captures a system's real performance and depends on factors like CPU time, I/O, memory access, and interactions between programs.
SystemVerilog Assertions verification with SVAUnit - DVCon US 2016 TutorialAmiq Consulting
This document provides an overview of SystemVerilog Assertions (SVAs) and the SVAUnit framework for verifying SVAs. It begins with an introduction to SVAs, including types of assertions and properties. It then discusses planning SVA development, such as identifying design characteristics and coding guidelines. The document outlines implementing SVAs and using the SVAUnit framework, which allows decoupling SVA definition from validation code. It provides an example demonstrating generating stimuli to validate an AMBA APB protocol SVA using SVAUnit. Finally, it summarizes SVAUnit's test API and features for error reporting and test coverage.
The document discusses loop level parallelism in OpenMP. It describes how to parallelize loops using directives like #pragma omp parallel for in C/C++ and !$omp parallel do in Fortran. It discusses restrictions on loop parallelism and issues like shared/private variables, reductions, scheduling and nested loops. It also introduces coarse-grained parallelism using parallel regions in OpenMP.
MLPerf an industry standard benchmark suite for machine learning performancejemin lee
MLPerf is an industry standard benchmark suite for measuring machine learning performance. It was created in 2018 to combine the best aspects of prior benchmark efforts and has the support of major tech companies and universities. MLPerf defines benchmarks for both training and inference and provides guidelines for fair comparisons, including rules around hyperparameters, model definitions, and variance. The goal is to drive development of specialized hardware and software through objective performance evaluations.
Deterministic Galois: On-demand, Portable and ParameterlessDonald Nguyen
Non-determinism in program execution can make program development and debugging difficult. In this paper, we argue that solutions to this problem should be on-demand, portable and parameterless. On-demand means that the programming model should permit the writing of non-deterministic pro- grams since these programs often perform better than deterministic programs for the same problem. Portable means that the program should produce the same answer even if it is run on different machines. Parameterless means that if there are machine-dependent scheduling parameters that must be tuned for good performance, they must not affect the output.
Although many solutions for deterministic program execution have been proposed in the literature, they fall short along one or more of these dimensions. To remedy this, we propose a new approach, based on the Galois programming model, in which (i) the programming model permits the writing of non-deterministic programs and (ii) the runtime system executes these programs deterministically if needed. Evaluation of this approach on a collection of benchmarks from the PARSEC, PBBS, and Lonestar suites shows that it delivers deterministic execution with substantially less overhead than other systems in the literature.
The document describes a workshop on Universal Verification Methodology (UVM) that will cover UVM concepts and techniques for verifying blocks, IP, SOCs, and systems. The workshop agenda includes presentations on UVM concepts and architecture, sequences and phasing, TLM2 and register packages, and putting together UVM testbenches. The workshop is organized by Dennis Brophy, Stan Krolikoski, and Yatin Trivedi and will take place on June 5, 2011 in San Diego, CA.
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
Talk given at the 8th ACM SIGPLAN Int'l Conf. on Software Language Engineering (SLE 2015), Pittsburgh, PA, USA on October 27, 2015. Preprint available at https://hal.inria.fr/hal-01182517
This document discusses performance analysis and parallel computing. It defines performance metrics like speedup, efficiency, and scalability that are used to evaluate parallel programs. Sources of parallel overhead like synchronization, load imbalance, and communication are described. The document also discusses benchmarks used to evaluate parallel systems like PARSEC and Rodinia. It emphasizes that overall execution time captures a system's real performance and depends on factors like CPU time, I/O, memory access, and interactions between programs.
SystemVerilog Assertions verification with SVAUnit - DVCon US 2016 TutorialAmiq Consulting
This document provides an overview of SystemVerilog Assertions (SVAs) and the SVAUnit framework for verifying SVAs. It begins with an introduction to SVAs, including types of assertions and properties. It then discusses planning SVA development, such as identifying design characteristics and coding guidelines. The document outlines implementing SVAs and using the SVAUnit framework, which allows decoupling SVA definition from validation code. It provides an example demonstrating generating stimuli to validate an AMBA APB protocol SVA using SVAUnit. Finally, it summarizes SVAUnit's test API and features for error reporting and test coverage.
The document discusses loop level parallelism in OpenMP. It describes how to parallelize loops using directives like #pragma omp parallel for in C/C++ and !$omp parallel do in Fortran. It discusses restrictions on loop parallelism and issues like shared/private variables, reductions, scheduling and nested loops. It also introduces coarse-grained parallelism using parallel regions in OpenMP.
The document discusses cyclomatic complexity, a software metric used to measure the number of linearly independent paths through a program's source code. It provides definitions and formulas for calculating complexity, describes desired properties of complexity metrics, and discusses applications of complexity metrics like testing, design validation, and security. The key points covered are the definition of cyclomatic complexity, methods for computing it, its relationship to testing efforts, and how it can impact reliability.
This Virtual User Group session, held on 2014-01-22, presents some of the techniques and algorithms used to improve the CPLEX MIP solver in versions 12.5.1 and 12.6.
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...RIA RUI Society
Dr. Vu Nguyen is a Director of Software Engineering at QASymphony and a Lecturer at the University of Science, Vietnam National University. At both places, he is involved in developing software tools and performing research in software estimation, testing, maintenance, and process.
Quality assurance management is an essential component of the software development lifecycle. To ensure quality, applicability, and usefulness of a product, development teams must spend considerable time and resources testing, which makes the estimation of the software testing effort, a critical activity. In this talk, we present an approach, namely V3 Analysis, to estimating the size of software testing work. The approach measures the size of a software test case based on its checkpoints, preconditions and test data, as well as the types of testing. We also introduce a supporting toolkit that you can use to estimate testing effort quickly for your projects.
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
This document discusses integrating adaptation mechanisms in self-adaptive software systems using control theory models. It presents a case study of using feedback control loops and control theory models to optimize a web server's performance by self-adjusting tuning parameters. The challenges of engineering such self-adaptive systems include control challenges for control engineers and integration challenges for software engineers. The study models the web server as a multi-input multi-output system and designs a linear quadratic regulator controller to optimize performance based on CPU utilization and memory usage.
The document discusses parallel programming concepts and models. It defines speedup and efficiency as measures of parallelization. Amdahl's Law states that the serial fraction of a program limits its scalability. Fine-grained parallelism operates at the loop level while coarse-grained parallelism involves larger sections of code. Shared memory and message passing are two common parallel programming models, with shared memory used for shared memory machines and message passing for distributed memory clusters. The programming model choice depends on the application and hardware, but does not determine scalability, which is limited by a program's inherent parallelism.
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesFilip Krikava
Presentation given at 29th Symposium On Applied Computing (SAC'14) - Dependable and Adaptive Distributed Systems track.
It is mainly based on the work done during my Ph.D.
This tutorial is intended for verification engineers that must validate algorithmic designs. It presents the detailed steps for implementing a SystemVerilog verification environment that interfaces with a GNU Octave mathematical model. It describes the SystemVerilog – C++ communication layer with its challenges, like proper creation and activation or piped algorithm synchronization handling. The implementation is illustrated for Ncsim, VCS and Questa.
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Hsien-Hsin Sean Lee, Ph.D.
This document provides information about the ECE 4100/6100 Advanced Computer Architecture course taught by Professor Hsien-Hsin Sean Lee at Georgia Tech. It outlines course details like prerequisites, textbook, scope, and grading. The scope covers modern microprocessor architecture concepts like instruction-level parallelism, memory hierarchy, and multiprocessors. It also discusses trends in technology, Moore's law, and the job of a computer architect in understanding application requirements and tradeoffs.
This document discusses key concepts related to variables in programming languages including names, bindings, scopes, and lifetimes. It covers different types of variables such as static, stack dynamic, and heap dynamic variables. It also compares static and dynamic scoping models and how they determine variable visibility. The referencing environment is defined as the collection of all visible variables for a given statement.
1. The document discusses process synchronization and solving the critical section problem. It covers solutions like Peterson's algorithm, mutex locks, semaphores, monitors and condition variables.
2. Classical synchronization problems like the bounded buffer, readers-writers, and dining philosophers problems are used to test new synchronization schemes.
3. The document also discusses topics like deadlock, starvation, serializability and concurrency control algorithms in transaction processing.
Finding Bugs Faster with Assertion Based Verification (ABV)DVClub
1) Assertion-based verification introduces assertions into a design to improve observability and controllability during simulation and formal analysis.
2) Assertions define expected behavior and can detect errors by monitoring signals within a design.
3) An assertion-based verification methodology leverages assertions throughout the verification flow from module to system level using various tools like simulation, formal analysis, and acceleration for improved productivity, quality, and reduced verification time.
Contribution of recurrent connectionist language models in improving lstm bas...anna8885
This paper proposes using recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos. It trains RNN and RNNME language models on a large Arabic text corpus and integrates them into an LSTM-CTC optical character recognition system using a modified beam search decoding scheme. Experimental results show the connectionist language models outperform n-gram models, improving word recognition rate by over 16% compared to the baseline model without a language model. The full system also outperforms a commercial OCR engine by over 35% word recognition rate.
Implementing subprograms requires saving execution context, allocating activation records, and maintaining dynamic or static chains. Activation records contain parameters, local variables, return addresses, and dynamic/static links. Nested subprograms are supported through static chains that connect activation records. Dynamic scoping searches the dynamic chain for non-local variables, while shallow access uses a central variable table. Blocks are implemented as parameterless subprograms to allocate separate activation records for block variables.
A microprocessor is an electronic component that is used by a computer to do its work. It is a central processing unit on a single integrated circuit chip containing millions of very small components including transistors, resistors, and diodes that work together.
System verilog verification building blocksNirav Desai
SystemVerilog introduces key concepts like program blocks, interfaces, and clocking blocks to help with verification. Program blocks separate the testbench code from the design code to avoid race conditions. Interfaces encapsulate communication between blocks and help prevent errors from manual port connections. Clocking blocks synchronize signal drivers and allow specifying timing for sampled signals. Together these features help manage complexity when verifying designs.
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
LAS16-400: Mini Conference 3 AOSP (Session 1)
Speakers: Thomas Gall, Bernhard Rosenkränzer
Date: September 29, 2016
★ Session Description ★
The Android Open Source Project is one community which is strategic to Linaro and it’s members. The purpose of this mini conference is to gather fellow Android engineers together from the community, member companies, and Linaro to discuss engineering activities and improve collaboration across different groups.
Within this mini conference we encourage discussion and presentations to advance engineering topics, forge consensus and educate each other.
The tentative agenda for this mini conference includes :
- Quick introduction
- Filesystems - Between requirements for encryption and standing concerns about degrading performance as an Android file system age, let’s have some discussion involving current data, known issues and towards improvements in this area for Android.
- HAL consolidation - Review current status and discuss next steps to work on.
One build for many devices: device/build configuration. Next features and platforms to add. Gaps in HiKey support vs. AOSP build.
- Graphics - YUV support in mesa and hwc.
- WiFi and sensor HAL status and next steps
- New developments with AOSP + the Kernel - With regards to the Google Common Kernel tree and upstream Linux kernel activities related to Android, there are a few topics up for discussion:
- - Updates on HiKey in AOSP
- - EAS in common.git & integration with AOSP userspace
- - New Sync API in 4.6+ kernels, and how it will affects graphics drivers
- AOSP transition to clang - As everyone knows GCC in AOSP has been deprecated. Let’s cover current status, issues and next steps. Let’s also discuss the elephant in the room, building the kernel with clang.
- Out of tree AOSP User space Patches - This is a discussion with the goal of organized action to see forward progress on AOSP user space patches that aren’t in AOSP for whatever reason.
- Android is used in some environments where booting can be frequent and affect the product experience. Do you want to wait for a minute while your car boots? We’ll spend time brainstorming on improving Android boot time.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-400
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-400/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Next Generation MPICH: What to Expect - Lightweight Communication and MoreIntel® Software
MPICH is a widely used, open-source implementation of the message passing interface (MPI) standard. It has been ported to many platforms and used by several vendors and research groups as the basis for their own MPI implementations. This session discusses the current development activity with MPICH, including a close collaboration with teams at Intel. We showcase preparing MPICH-derived implementations for deployment on upcoming supercomputers like Aurora (from the Argonne Leadership Computing Facility), which is based on the Intel® Xeon Phi™ processor and Intel® Omni-Path Architecture (Intel® OPA).
This document discusses various techniques for optimizing computer code, including:
1. Local optimizations that improve performance within basic blocks, such as constant folding, propagation, and elimination of redundant computations.
2. Global optimizations that analyze control flow across basic blocks, such as common subexpression elimination.
3. Loop optimizations that improve performance of loops by removing invariant data and induction variables.
4. Machine-dependent optimizations like peephole optimizations that replace instructions with more efficient alternatives.
The goal of optimizations is to improve speed and efficiency while preserving program meaning and correctness. Optimizations can occur at multiple stages of development and compilation.
This document discusses instruction pipelining as a technique to improve computer performance. It explains that pipelining allows multiple instructions to be processed simultaneously by splitting instruction execution into stages like fetch, decode, execute, and write. While pipelining does not reduce the time to complete individual instructions, it improves throughput by allowing new instructions to begin processing before previous instructions have finished. The document outlines some challenges to achieving peak performance from pipelining, such as pipeline stalls from hazards like data dependencies between instructions. It provides examples of how data hazards can occur if the results of one instruction are needed by a subsequent instruction before they are available.
This document discusses subprograms and parameter passing in programming languages. It covers fundamental concepts of subprograms like definitions, calls, headers, and parameters. It then describes different parameter passing methods like pass-by-value, pass-by-reference, and pass-by-name. It also discusses how major languages like C, C++, Java, Ada, C#, and PHP implement parameter passing and type checking.
The document discusses validation and design in small teams with limited resources. It proposes constraining designs to a single clock rate, using FIFO interfaces between blocks, and separating algorithm from IO verification to simplify validation. This approach allows designs to be completed more quickly with fewer verification engineers through standardized, repeatable validation methods at the cost of optimal performance.
Validation and Design in a Small Team EnvironmentDVClub
The document discusses validation and design in small teams with limited resources. It proposes constraining designs to a single clock rate, standardized interfaces, and automated test cases to streamline verification. This reduces complexity and verification costs, allowing designs to be completed more quickly despite limited experience. Standardizing interfaces and separating algorithm from implementation verification improves efficiency enough to overcome typical verification to design ratios.
The document discusses cyclomatic complexity, a software metric used to measure the number of linearly independent paths through a program's source code. It provides definitions and formulas for calculating complexity, describes desired properties of complexity metrics, and discusses applications of complexity metrics like testing, design validation, and security. The key points covered are the definition of cyclomatic complexity, methods for computing it, its relationship to testing efforts, and how it can impact reliability.
This Virtual User Group session, held on 2014-01-22, presents some of the techniques and algorithms used to improve the CPLEX MIP solver in versions 12.5.1 and 12.6.
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...RIA RUI Society
Dr. Vu Nguyen is a Director of Software Engineering at QASymphony and a Lecturer at the University of Science, Vietnam National University. At both places, he is involved in developing software tools and performing research in software estimation, testing, maintenance, and process.
Quality assurance management is an essential component of the software development lifecycle. To ensure quality, applicability, and usefulness of a product, development teams must spend considerable time and resources testing, which makes the estimation of the software testing effort, a critical activity. In this talk, we present an approach, namely V3 Analysis, to estimating the size of software testing work. The approach measures the size of a software test case based on its checkpoints, preconditions and test data, as well as the types of testing. We also introduce a supporting toolkit that you can use to estimate testing effort quickly for your projects.
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
This document discusses integrating adaptation mechanisms in self-adaptive software systems using control theory models. It presents a case study of using feedback control loops and control theory models to optimize a web server's performance by self-adjusting tuning parameters. The challenges of engineering such self-adaptive systems include control challenges for control engineers and integration challenges for software engineers. The study models the web server as a multi-input multi-output system and designs a linear quadratic regulator controller to optimize performance based on CPU utilization and memory usage.
The document discusses parallel programming concepts and models. It defines speedup and efficiency as measures of parallelization. Amdahl's Law states that the serial fraction of a program limits its scalability. Fine-grained parallelism operates at the loop level while coarse-grained parallelism involves larger sections of code. Shared memory and message passing are two common parallel programming models, with shared memory used for shared memory machines and message passing for distributed memory clusters. The programming model choice depends on the application and hardware, but does not determine scalability, which is limited by a program's inherent parallelism.
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesFilip Krikava
Presentation given at 29th Symposium On Applied Computing (SAC'14) - Dependable and Adaptive Distributed Systems track.
It is mainly based on the work done during my Ph.D.
This tutorial is intended for verification engineers that must validate algorithmic designs. It presents the detailed steps for implementing a SystemVerilog verification environment that interfaces with a GNU Octave mathematical model. It describes the SystemVerilog – C++ communication layer with its challenges, like proper creation and activation or piped algorithm synchronization handling. The implementation is illustrated for Ncsim, VCS and Questa.
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Hsien-Hsin Sean Lee, Ph.D.
This document provides information about the ECE 4100/6100 Advanced Computer Architecture course taught by Professor Hsien-Hsin Sean Lee at Georgia Tech. It outlines course details like prerequisites, textbook, scope, and grading. The scope covers modern microprocessor architecture concepts like instruction-level parallelism, memory hierarchy, and multiprocessors. It also discusses trends in technology, Moore's law, and the job of a computer architect in understanding application requirements and tradeoffs.
This document discusses key concepts related to variables in programming languages including names, bindings, scopes, and lifetimes. It covers different types of variables such as static, stack dynamic, and heap dynamic variables. It also compares static and dynamic scoping models and how they determine variable visibility. The referencing environment is defined as the collection of all visible variables for a given statement.
1. The document discusses process synchronization and solving the critical section problem. It covers solutions like Peterson's algorithm, mutex locks, semaphores, monitors and condition variables.
2. Classical synchronization problems like the bounded buffer, readers-writers, and dining philosophers problems are used to test new synchronization schemes.
3. The document also discusses topics like deadlock, starvation, serializability and concurrency control algorithms in transaction processing.
Finding Bugs Faster with Assertion Based Verification (ABV)DVClub
1) Assertion-based verification introduces assertions into a design to improve observability and controllability during simulation and formal analysis.
2) Assertions define expected behavior and can detect errors by monitoring signals within a design.
3) An assertion-based verification methodology leverages assertions throughout the verification flow from module to system level using various tools like simulation, formal analysis, and acceleration for improved productivity, quality, and reduced verification time.
Contribution of recurrent connectionist language models in improving lstm bas...anna8885
This paper proposes using recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos. It trains RNN and RNNME language models on a large Arabic text corpus and integrates them into an LSTM-CTC optical character recognition system using a modified beam search decoding scheme. Experimental results show the connectionist language models outperform n-gram models, improving word recognition rate by over 16% compared to the baseline model without a language model. The full system also outperforms a commercial OCR engine by over 35% word recognition rate.
Implementing subprograms requires saving execution context, allocating activation records, and maintaining dynamic or static chains. Activation records contain parameters, local variables, return addresses, and dynamic/static links. Nested subprograms are supported through static chains that connect activation records. Dynamic scoping searches the dynamic chain for non-local variables, while shallow access uses a central variable table. Blocks are implemented as parameterless subprograms to allocate separate activation records for block variables.
A microprocessor is an electronic component that is used by a computer to do its work. It is a central processing unit on a single integrated circuit chip containing millions of very small components including transistors, resistors, and diodes that work together.
System verilog verification building blocksNirav Desai
SystemVerilog introduces key concepts like program blocks, interfaces, and clocking blocks to help with verification. Program blocks separate the testbench code from the design code to avoid race conditions. Interfaces encapsulate communication between blocks and help prevent errors from manual port connections. Clocking blocks synchronize signal drivers and allow specifying timing for sampled signals. Together these features help manage complexity when verifying designs.
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
LAS16-400: Mini Conference 3 AOSP (Session 1)
Speakers: Thomas Gall, Bernhard Rosenkränzer
Date: September 29, 2016
★ Session Description ★
The Android Open Source Project is one community which is strategic to Linaro and it’s members. The purpose of this mini conference is to gather fellow Android engineers together from the community, member companies, and Linaro to discuss engineering activities and improve collaboration across different groups.
Within this mini conference we encourage discussion and presentations to advance engineering topics, forge consensus and educate each other.
The tentative agenda for this mini conference includes :
- Quick introduction
- Filesystems - Between requirements for encryption and standing concerns about degrading performance as an Android file system age, let’s have some discussion involving current data, known issues and towards improvements in this area for Android.
- HAL consolidation - Review current status and discuss next steps to work on.
One build for many devices: device/build configuration. Next features and platforms to add. Gaps in HiKey support vs. AOSP build.
- Graphics - YUV support in mesa and hwc.
- WiFi and sensor HAL status and next steps
- New developments with AOSP + the Kernel - With regards to the Google Common Kernel tree and upstream Linux kernel activities related to Android, there are a few topics up for discussion:
- - Updates on HiKey in AOSP
- - EAS in common.git & integration with AOSP userspace
- - New Sync API in 4.6+ kernels, and how it will affects graphics drivers
- AOSP transition to clang - As everyone knows GCC in AOSP has been deprecated. Let’s cover current status, issues and next steps. Let’s also discuss the elephant in the room, building the kernel with clang.
- Out of tree AOSP User space Patches - This is a discussion with the goal of organized action to see forward progress on AOSP user space patches that aren’t in AOSP for whatever reason.
- Android is used in some environments where booting can be frequent and affect the product experience. Do you want to wait for a minute while your car boots? We’ll spend time brainstorming on improving Android boot time.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-400
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-400/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
Next Generation MPICH: What to Expect - Lightweight Communication and MoreIntel® Software
MPICH is a widely used, open-source implementation of the message passing interface (MPI) standard. It has been ported to many platforms and used by several vendors and research groups as the basis for their own MPI implementations. This session discusses the current development activity with MPICH, including a close collaboration with teams at Intel. We showcase preparing MPICH-derived implementations for deployment on upcoming supercomputers like Aurora (from the Argonne Leadership Computing Facility), which is based on the Intel® Xeon Phi™ processor and Intel® Omni-Path Architecture (Intel® OPA).
This document discusses various techniques for optimizing computer code, including:
1. Local optimizations that improve performance within basic blocks, such as constant folding, propagation, and elimination of redundant computations.
2. Global optimizations that analyze control flow across basic blocks, such as common subexpression elimination.
3. Loop optimizations that improve performance of loops by removing invariant data and induction variables.
4. Machine-dependent optimizations like peephole optimizations that replace instructions with more efficient alternatives.
The goal of optimizations is to improve speed and efficiency while preserving program meaning and correctness. Optimizations can occur at multiple stages of development and compilation.
This document discusses instruction pipelining as a technique to improve computer performance. It explains that pipelining allows multiple instructions to be processed simultaneously by splitting instruction execution into stages like fetch, decode, execute, and write. While pipelining does not reduce the time to complete individual instructions, it improves throughput by allowing new instructions to begin processing before previous instructions have finished. The document outlines some challenges to achieving peak performance from pipelining, such as pipeline stalls from hazards like data dependencies between instructions. It provides examples of how data hazards can occur if the results of one instruction are needed by a subsequent instruction before they are available.
This document discusses subprograms and parameter passing in programming languages. It covers fundamental concepts of subprograms like definitions, calls, headers, and parameters. It then describes different parameter passing methods like pass-by-value, pass-by-reference, and pass-by-name. It also discusses how major languages like C, C++, Java, Ada, C#, and PHP implement parameter passing and type checking.
The document discusses validation and design in small teams with limited resources. It proposes constraining designs to a single clock rate, using FIFO interfaces between blocks, and separating algorithm from IO verification to simplify validation. This approach allows designs to be completed more quickly with fewer verification engineers through standardized, repeatable validation methods at the cost of optimal performance.
Validation and Design in a Small Team EnvironmentDVClub
The document discusses validation and design in small teams with limited resources. It proposes constraining designs to a single clock rate, standardized interfaces, and automated test cases to streamline verification. This reduces complexity and verification costs, allowing designs to be completed more quickly despite limited experience. Standardizing interfaces and separating algorithm from implementation verification improves efficiency enough to overcome typical verification to design ratios.
This document discusses process synchronization and solutions to the classic critical section problem. It covers Peterson's solution, which uses shared variables and atomic instructions to ensure only one process can be in its critical section at a time. It also discusses solutions using mutex locks and semaphores implemented via hardware instructions like compare-and-swap to synchronize processes. Memory barriers are introduced to address issues with instruction reordering on modern architectures.
1) Callgraph analysis of ATLAS software identified clusters of heavily called functions that could benefit from inlining to reduce instruction counts. Inlining requires changes to code and use of link-time optimization with profile guidance.
2) Avoiding position independent code may improve performance but reduce code sharing. Static libraries could allow link-time optimization.
3) Tools like IgProf, SystemTap and perf events can profile memory and performance, but a visualizer is needed to analyze object-oriented software. Sampling branch records may improve basic block counts.
This document discusses computer architecture performance, including metrics like execution time, throughput, and instructions per cycle (IPC). It provides examples of calculating the cycles per instruction (CPI) for different instruction types and evaluating potential design changes based on their impact on CPI and overall performance. The principles of locality and Amdahl's Law, which states that speedups from parallelism are limited by the serial fraction of a program, are also covered.
This document provides information about the ME 190M Introduction to Model Predictive Control course taught in fall 2009 at UC Berkeley. The class will be taught on Fridays from 11am to 12pm in room 1165 of Etcheverry Hall. Homework assignments will be given weekly and selected assignments will be graded. Students will need to use MATLAB for assignments, which they can access in room 2109 of Etcheverry Hall. The course will cover modeling, optimization fundamentals, constrained optimal control, predictive control fundamentals and properties, and examples implemented in MATLAB. The goals are for students to design, implement, and tune simple MPC controllers in MATLAB for linear and nonlinear systems.
Have you ever wondered how to speed up your code in Python? This presentation will show you how to start. I will begin with a guide how to locate performance bottlenecks and then give you some tips how to speed up your code. Also I would like to discuss how to avoid premature optimization as it may be ‘the root of all evil’ (at least according to D. Knuth).
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
AlphaCode is a system for competitive code generation that achieves top 54.3% performance on average in competitions with over 5,000 participants. It uses a large transformer model pre-trained on GitHub code and fine-tuned on a competitive programming dataset. During fine-tuning, it employs techniques like tempering and GOLD to focus on precision over recall. At test time, it generates a large number of samples, filters them based on example tests, and clusters similar programs to select submissions. Extensive evaluations on CodeContests and APPS benchmarks show AlphaCode's performance scales log-linearly with more samples and compute.
This document discusses computer performance and defines key terms like response time and throughput. It explains that performance is measured based on time and discusses factors like clock rate, clock cycles, instructions per cycle (CPI), and benchmarks. Maximizing single-thread performance is challenging due to the power wall, so improvements now focus on parallelism through multiple cores rather than just increasing frequency.
This document discusses the evolution of computer architecture from CISC to RISC designs. It provides details on major advances like microprogramming, cache memory, and microprocessors that influenced computer designs. Reduced Instruction Set Computers (RISC) aimed to improve performance by using simpler instructions optimized for pipelining. RISC relies on register-based operations while CISC uses more complex instructions mapped to microcode. The tradeoffs between the two approaches are still debated as most modern designs incorporate elements of both.
Mixing d ps building architecture on the cross cutting examplecorehard_by
В рамках доклада мы поговорим о важности архитектурных решений, в том числе, для обеспечения высокого качества ПО при минимальных трудозатратах. Сквозной пример из области резервного копирования данных позволит лучше понять техническую, QA и общепроцессную составляющие подхода. Прошло достаточно времени, чтобы раскрыть технические детали без нарушения NDA, предложенный вариант на базе метрик, которые мы обязательно упомянем, был признан лучшим архитектурным решением в рамках компании – одного из лидеров отрасли, получил награду Microsoft, был «размножен» на смежные области. Приступаем: Builder, Decorator, Composite, Iterator и Visitor - как эти паттерны помогли решить нетривиальную С++ задачу.
The document discusses ONNC, a compiler for deep learning formats like ONNX. It aims to connect ONNX to various deep learning accelerator (DLA) chips to help vendors bring products to market faster. Key features include supporting DLA features, optimizing memory usage and execution time, and being released as open source before the end of July 2018.
Algorithm and C code related to data structureSelf-Employed
Everything lies inside an algorithm in the world of coding and algorithm formation which is the basis of data structure and manipulation of the algorithm in computer science and information technology which is ultimately used to find a particular problems solution
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Unai Lopez-Novoa
The document outlines Unai Lopez Novoa's PhD dissertation on efficiently using general purpose coprocessors, with kernel density estimation as a case study. It introduces the motivation and challenges of porting applications to accelerators. It then describes the contributions of a novel efficient kernel density estimation algorithm called S-KDE and its implementation for multi-core and many-core processors and general purpose coprocessors. Finally, it proposes a methodology for environmental model evaluation based on S-KDE.
This document discusses various metrics for evaluating computer performance and discusses latency. It defines latency as the time it takes a computer to perform a single task and discusses how latency is measured. Latency is important for application responsiveness, real-time applications, and other situations where waiting time matters. The document also introduces the performance equation that models latency in terms of architectural parameters like instructions, clock cycles, and clock frequency.
Evaluating computers involves considering metrics like latency, throughput, bandwidth, cost, power, and reliability. Latency refers to how long a single task takes and is usually measured in seconds or clock cycles. Performance is defined as the inverse of latency, so a system with lower latency is considered higher performing. Amdahl's Law states that the overall speedup from optimizing a portion of a system is limited by the percentage of time spent in that portion. It is important for determining whether optimizations are worthwhile.
This document provides information about a course on programming for problem solving in C. It includes the course objectives, which are to give exposure to procedural programming principles in C, introduce computational thinking and develop C programs using basic constructs, and enable students to apply C fundamentals to solve engineering problems. The course outcomes cover various C programming concepts like data types, control structures, arrays, strings, pointers, structures, unions and files. The syllabus is divided into 5 units covering these topics. Textbooks for the course are also listed.
A separately excited dc motor is driven from a 240v, 50HZ supply via a HC
SCR-bridge with a fly-wheel diode. The motor has an armature resistance
1Ω, an armature voltage constant Kv of 0.8 V. s/rad. The field current is
constant. Assume steady armature current. Determine the armature current
and torque for 1600 rpm and a firing angle delay of a) 30° b) 60
This document discusses the evolution of computer architecture from CISC to RISC designs. It provides details on major advances like microprogramming, cache memory, and microprocessors that influenced computer design. It then describes the key features of RISC systems like large register files, limited instruction sets, and emphasis on instruction pipelining. The document discusses optimizations for RISC designs like delayed branching and register allocation techniques. It also examines the debate around the performance of RISC versus CISC and notes most modern systems incorporate aspects of both.
Similar to Constraint Programming in Compiler Optimization: Lessons Learned (20)
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Constraint Programming in Compiler Optimization: Lessons Learned
1. Constraint Programming in Compiler
Optimization: Lessons Learned
Peter van Beek
University of Waterloo
2. Acknowledgements
• Joint work with:
• Funding:
Omer Beg
NSERC
Alejandro López-Ortiz
IBM Canada
Abid Malik
Jim McInnes
Wayne Oldford
Claude-Guy Quimper
John Tromp
Kent Wilken
Huayue Wu
3. Application-driven research
• Idea:
• pick an application—a real-world problem—where, if you solve it, there would be a
significant impact
• Along the way, if all goes well, you will also:
• identify and fill gaps in theory
• identify and solve interesting sub-problems whose solutions will have general
applicability
6. Production compilers
“At the outset, note that basic-block scheduling is an NP-hard
problem, even with a very simple formulation of the
problem, so we must seek an effective heuristic, rather than an
exact approach.”
Steven Muchnick,
Advanced Compiler Design
& Implementation, 1997
8. Computer architecture:
Performing instructions in parallel
• Multiple-issue
• multiple functional units;
e.g., ALUs, FPUs, load/store units, branch
units
• multiple instructions can be issued (begin
execution) each clock cycle
• issue width: max number of instructions that
can be issued each clock cycle
• on most architectures issue width less than
number of functional units
9. Computer architecture:
Performing instructions in parallel
• Pipelining
• overlap execution of instructions on a single
functional unit
• latency of an instruction
number of cycles before result is available
• execution time of an instruction
number of cycles before next instruction
can be issued on same functional unit
• serializing instruction
instruction that requires exclusive use of
entire processor in cycle in which it is issued
Analogy: vehicle assembly line
10. Superblock instruction scheduling
• Instruction scheduling
• assignment of a clock cycle to each instruction
• needed to take advantage of complex features of
architecture
• sometimes necessary for correctness (VLIW)
• Basic block
• straight-line sequence of code with single entry, single exit
• Superblock
• collection of basic blocks with a unique entrance but multiple exits
• Given a target architecture, find schedule with minimum expected
completion time
11. Example superblock
A:1
1
dependency DAG
• nodes
1
C:1
B:3
5
5
• one for each instruction
• labeled with execution time
D:1
2
• nodes F and G are branch
instructions, labeled with
probability the exit is taken
0
0
2
F:1
• arcs
• represent precedence
E:1
0
40%
G:1
• labeled with latencies
60%
12. Example superblock
A:1
1
optimal cost schedule for
2-issue processor
cycle
C:1
B:3
5
5
ALU FPU
1
2
D:1
A
B
E:1
2
3
4
5
1
0
2
F:1
C
0
6
7
8
E
0
D
9
G:1
F
10
40%
G
60%
15. Computer architecture:
Clustered architectures
• Current: digital signal processing
• multimedia, audio processing, image processing
• wireless, ADSL modems, …
• Future trend: general purpose multi-core processors
• large numbers of cores
• fast inter-processor communication
16. Spatial and temporal scheduling
A
1
2
1
B
C
2
D
cycle
2
E
1
2
F
G
2
1
20%
H
1
2
3
4
5
6
7
8
9
10
c0
cycle
A
B
C
D
E
F
G
H
cost = 9.8
80%
1
2
3
4
5
6
7
8
9
10
c0
c1
A
B
C
D
E
F
G
H
cost = 7.6
17. Spatial and temporal scheduling
A
1
2
1
B
C
2
D
cycle
2
E
1
2
F
G
2
1
20%
H
1
2
3
4
5
6
7
8
9
10
c0
c1
A
B
C
D
E
F
G
H
cost = 7.6
80%
18. Approaches
• Superblock instruction scheduling is NP-complete
• Heuristic approaches in all commercial and open-source research compilers
• greedy list scheduling algorithm coupled with a priority heuristic
• Here: Optimal approach
• useful when longer compile times are tolerable
• e.g., compiling for software libraries, digital signal processing, embedded
applications, final production build
20. Temporal scheduler:
Basic constraint model
A
1
variables
1
C
B
A, B, C, D, E, F, G
5
5
domains
{1, …, m}
D
constraints
E
2
B
A + 1, C
D
B + 5, …, G
0
0
A + 1,
F
2
F
gcc(A, B, C, F, G, nALU)
gcc(D, E, nFPU)
gcc(A, …, G, issuewidth)
0
40%
G
cost function
40 F + 60 G
60%
22. Temporal scheduler:
Improving the model
• Add constraints to increase constraint propagation (e.g., Smith 2006)
• implied constraints: do not change set of solutions
• dominance constraints: preserve an optimal solution
• Here:
• many constraints added to constraint model in extensive preprocessing stage
that occurs once
• extensive preprocessing effort pays off as model is solved many times
23. Temporal scheduler:
Improving the solver
• From optimization to satisfaction
• find bounds on cost function
• enumerate solutions to cost function (knapsack constraint; Trick 2001)
• step through in increasing order of cost
• Improved bounds consistency algorithm for gcc constraints
• Use portfolio to improve performance (Gomes et al. 1997)
• increasing levels of constraint propagation
• Impact-based variable ordering (Refalo 2004)
• Structure-based decomposition technique (Freuder 1994)
24. Spatial and temporal scheduler:
Basic constraint model
variables
A
cycle of issue:
xA, xB, …, xH
cluster:
yA, yB, …, yH
1
2
domains
dom(x) = {1, …, m}
dom(y) = {0, …, k−1}
communication constraints
yA ≠ yC → xC ≥ xA + 1 + cost
B
C
2
D
2
E
1
2
…
G
1
20%
H
cost function
xH + 20
F
2
yA = yC → xC ≥ xA + 1
80
1
xG
80%
25. Spatial and temporal scheduler:
Improving the model
• Symmetry breaking
A
• add auxiliary variables: zAC, zBC, …
• dom(z) = {„=‟, „≠‟}
• instead of backtracking on the y‟s
backtrack on the edges with z‟s
• preserves at least one optimal solution
B
2
1
C
1
D
26. Spatial and temporal scheduler:
Improving the solver
• Preprocess DAG to find instructions which must be on same cluster
• preserve an optimal solution
• Variable ordering
• assign z variables first, in breadth-first order of DAG
• determine assignment for corresponding y variables
• determine cost of temporal schedule for these assignments
28. Experimental setup: Instances
• All 154,651 superblocks from SPEC 2000 integer and floating pt. benchmarks
• standard benchmark suite
• consists of software packages chosen to be representative of types of
programming languages and applications
• superblocks generated by IBM‟s Tobey compiler when compiling the software
packages
• compilations done using Tobey‟s highest level of optimization
29. Experimental setup: Target architectures
Realistic architectures:
• not fully pipelined
• issue width not equal to number of functional units
• serializing instructions
architecture
issue
width
simple
int. units
1-issue
1
1
2-issue
2
1
4-issue
4
2
6-issue
6
2
complex
int. units
branch
units
floating
pt. units
1
1
memory
units
1
1
1
1
1
2
3
2
30. Experimental results: Temporal scheduler
Total time (hh:mm:ss) to schedule all superblocks and percentage
solved to optimality, for various time limits for solving each instance
1 sec.
architecture
time
10 sec.
%
time
1 min.
%
7:15:46 99.38
time
10 min.
%
10:22:36 99.96
time
%
1-issue
1:30:20 97.34
15:08:44 99.98
2-issue
3:57:13 91.83 30:53:83 93.90 108:50:01 97.18 665:31:00 97.70
4-issue
2:17:44 95.47 17:09:48 96.60
61:29:31 98.43 343:04:46 98.87
6-issue
3:04:18 93.59 25:03:44 94.76
87:04:34 97.78 511:19:14 98.29
31. Spatial and temporal scheduler:
Some related work
• Bottom Up Greedy (BUG) [Ellis. MIT Press „86]
• greedy heuristic algorithm
• localized clustering decisions
• Hierarchical Partitioning (RHOP) [Chu et al. PLDI „03]
• coarsening and refinement heuristic
• weights of nodes and edges updated as algorithm progresses
35. Lessons learned (I)
• Pick problem carefully
• is a new solution needed?
• what is the likelihood of success?
• Existing heuristics may not leave any room for improvement
• examples: basic block scheduling, instruction selection
36. Lessons learned (II)
• Be prepared for adversity
• significant overhead
• learning domain of application
• significant implementation
• significant engineering
• different research cultures
• researchers are tribal
• different standards of reviewing (number & contentiousness)
• different standards of evaluation, formalization, assumptions
37. Lessons learned (III)
• Rewards
• can be attractive to students
• can lead to identifying and solving interesting sub-problems whose solutions have
general applicability
• bounds consistency for alldifferent and gcc global constraints
• restarts and portfolios
• machine learning of heuristics
39. Selected publications
• Applications
A. M. Malik, M. Chase, T. Russell, and P. van Beek. An application of constraint programming to superblock
instruction scheduling. CP-2008.
M. Beg and P. van Beek. A constraint programming approach for integrated spatial and temporal scheduling for
clustered architectures. ACM TECS, To appear.
• Global constraints
C.-G. Quimper, P. van Beek, A. Lopez-Ortiz, A. Golynski, and S. Bashir Sadjad. An efficient bounds consistency
algorithm for the global cardinality constraint. CP-2003.
A. Lopez-Ortiz, C.-G. Quimper, J. Tromp, and P. van Beek. A fast and simple algorithm for bounds consistency of
the alldifferent constraint. IJCAI-2003.
• Portfolios and restarts
H. Wu and P. van Beek. On portfolios for backtracking search in the presence of deadlines. ICTAI-2007.
H. Wu and P. van Beek. On universal restart strategies for backtracking search. CP-2007.
• Heuristics and machine learning
T. Russell, A. M. Malik, M. Chase, and P. van Beek. Learning heuristics for the superblock instruction scheduling
problem. IEEE TKDE, 2009.
M. Chase, A. M. Malik, T. Russell, R. W. Oldford, and P. van Beek. A computational study of heuristic and exact
techniques for superblock instruction scheduling. J. of Scheduling, 2012.
41. Spatial and temporal scheduler:
Search tree of basic model
yA=
A
B
2
0
1
2
3
1
yB=
yC=
C
1
yD=
0
1
0
0
12 3
2 3
D
find temporal schedule
for y = (0, 0, 0, 2)
1 2
3
0 1 2
0
1
0
3
23
1 2 3
42. Spatial and temporal scheduler:
Search tree of improved model
zAC=
A
B
2
1
C
1
zBC= (‘=’)
zCD= (‘=’)
(‘≠’)
(‘=’)
(‘≠’)
(‘=’)
(‘≠’)
(‘=’) (‘≠’)
(‘=’)
(‘≠’)
(‘≠’)
(‘=’) (‘≠’)
D
determine y,
find temporal schedule
for y =(0,0,0,0)
same as y =(1,1,1,1) etc.
determine y,
find temporal schedule
for y =(0,1,1,0)
same as y =(2,3,3,2), y =(0,2,2,3) etc.
44. Instruction Selection
• Given
• an expression DAG G
• a set of tiles representing machine instructions
• Find a mapping of tiles to nodes in G of minimal cost (size) that covers G
• Complexity:
• polynomial for trees
• NP-hard for DAGs