The document discusses speculation and speculative execution in modern microprocessors. It explains that processors predict upcoming instructions and speculatively execute them to improve performance. If the prediction is correct, the results are committed, but if not the results are discarded. The document also discusses how transistor counts have increased quadratically compared to linear speed increases, enabling more complex superscalar and pipeline designs to exploit instruction level parallelism.
This document summarizes multi-core computer architectures. It discusses how single-core CPUs are being replaced by multi-core chips that contain multiple processor cores on a single die. Each core can run threads in parallel for improved performance. The cores share the same memory and socket. Operating systems see each core as a separate processor. Issues around cache coherence and programming for multi-core architectures are also covered at a high level.
eMMC is embedded multimedia card that provides managed flash storage. It contains an internal flash controller that handles flash translation layer functions like bad block management and wear leveling. This shields the host processor from needing to understand raw NAND flash characteristics. eMMC uses caching and its memory array to provide better read/write performance than raw NAND flash. It can be divided into multiple partitions like boot, RPMB, and general purpose partitions that are independently addressed through configuration registers.
Universal Flash Storage is an upcoming memory specification for use in mobile phones, tablets and other consumer electronics devices.
It is the successor of Embedded Multimedia controller (eMMC) that currently prevails and will be available as storage in on-chip and expandable form (in the form of memory cards).
Apple inc., is played an major mobile tech beast roll in smart mobile industry and it's processor played a major roll in this industry. So, I want to discuss here about the history of these processors.
The document compares the Intel Core i3 processor and the 8086 microprocessor. It provides details on the features, architecture, specifications of each. Key differences highlighted in the comparison section include: the Core i3 is a 64-bit dual-core processor that supports up to 16GB of memory and Hyper-Threading, while the 8086 is a single-core 16-bit processor that can only access 1MB of memory. Overall, the Core i3 offers improved performance over the 8086 due to its multiple cores, larger address space, and integrated graphics.
The document discusses Universal Flash Storage (UFS), the next generation mobile storage interface that will succeed eMMC. UFS utilizes MIPI M-PHY and Unipro standards and supports SCSI commands. It provides higher performance than eMMC through features like asynchronous command execution, command queuing, and support for multiple lanes. The document outlines UFS performance specifications, architecture, and compares it to alternatives like eMMC and SATA. It also discusses Samsung's UFS development timeline and test framework to validate UFS host and device functionality.
Universal Flash Storage (UFS) is a NAND flash storage specification developed by JEDEC that improves on eMMC. UFS uses a serial interface for faster read/write speeds compared to eMMC's parallel interface. It has a layered architecture including a device manager layer, UFS command set layer, UFS transport protocol layer, and UFS interconnect layer. The document discusses these layers and covers UFS features like logical units, command formats like UPIU, and SCSI commands supported in UFS including MODE SELECT, MODE SENSE, and READ/WRITE commands.
This document provides an overview of the AMD EPYCTM microprocessor architecture. It discusses the key tenets of the EPYC processor design including the "Zen" CPU core, virtualization and security features, high per-socket capability through its multi-chip module (MCM) design, high bandwidth fabric interconnect, large memory capacity and disruptive I/O capabilities. It also details the microarchitecture of the "Zen" core and how it was designed and optimized for data center workloads.
This document summarizes multi-core computer architectures. It discusses how single-core CPUs are being replaced by multi-core chips that contain multiple processor cores on a single die. Each core can run threads in parallel for improved performance. The cores share the same memory and socket. Operating systems see each core as a separate processor. Issues around cache coherence and programming for multi-core architectures are also covered at a high level.
eMMC is embedded multimedia card that provides managed flash storage. It contains an internal flash controller that handles flash translation layer functions like bad block management and wear leveling. This shields the host processor from needing to understand raw NAND flash characteristics. eMMC uses caching and its memory array to provide better read/write performance than raw NAND flash. It can be divided into multiple partitions like boot, RPMB, and general purpose partitions that are independently addressed through configuration registers.
Universal Flash Storage is an upcoming memory specification for use in mobile phones, tablets and other consumer electronics devices.
It is the successor of Embedded Multimedia controller (eMMC) that currently prevails and will be available as storage in on-chip and expandable form (in the form of memory cards).
Apple inc., is played an major mobile tech beast roll in smart mobile industry and it's processor played a major roll in this industry. So, I want to discuss here about the history of these processors.
The document compares the Intel Core i3 processor and the 8086 microprocessor. It provides details on the features, architecture, specifications of each. Key differences highlighted in the comparison section include: the Core i3 is a 64-bit dual-core processor that supports up to 16GB of memory and Hyper-Threading, while the 8086 is a single-core 16-bit processor that can only access 1MB of memory. Overall, the Core i3 offers improved performance over the 8086 due to its multiple cores, larger address space, and integrated graphics.
The document discusses Universal Flash Storage (UFS), the next generation mobile storage interface that will succeed eMMC. UFS utilizes MIPI M-PHY and Unipro standards and supports SCSI commands. It provides higher performance than eMMC through features like asynchronous command execution, command queuing, and support for multiple lanes. The document outlines UFS performance specifications, architecture, and compares it to alternatives like eMMC and SATA. It also discusses Samsung's UFS development timeline and test framework to validate UFS host and device functionality.
Universal Flash Storage (UFS) is a NAND flash storage specification developed by JEDEC that improves on eMMC. UFS uses a serial interface for faster read/write speeds compared to eMMC's parallel interface. It has a layered architecture including a device manager layer, UFS command set layer, UFS transport protocol layer, and UFS interconnect layer. The document discusses these layers and covers UFS features like logical units, command formats like UPIU, and SCSI commands supported in UFS including MODE SELECT, MODE SENSE, and READ/WRITE commands.
This document provides an overview of the AMD EPYCTM microprocessor architecture. It discusses the key tenets of the EPYC processor design including the "Zen" CPU core, virtualization and security features, high per-socket capability through its multi-chip module (MCM) design, high bandwidth fabric interconnect, large memory capacity and disruptive I/O capabilities. It also details the microarchitecture of the "Zen" core and how it was designed and optimized for data center workloads.
The document discusses the specifications and technologies of the Intel Core i5-3550S processor. It uses the Ivy Bridge microarchitecture with a 22nm process. It has 4 cores with 4 threads each, supports up to 32GB of RAM, and has cache memory including a shared 8MB L3 cache. It supports many Intel technologies like Turbo Boost, Hyper-Threading, Virtualization, and AES-NI.
The document introduces the ONFI 3.0 NAND flash controller standard. It features faster data transfer speeds up to 200MT/s using differential signaling and DDR-2. It allows for dynamically scalable error correction and supports new commands. Arasan provides a fully compliant ONFI 3.0 controller core along with documentation, models, and verification IP to help customers integrate it into their designs.
This document contains 5 questions about raster systems and frame buffers:
1) It asks for the frame buffer size in bytes needed to store 12 and 24 bits per pixel for raster systems with resolutions of 640x480, 1280x1024, and 2560x2048.
2) It asks for the number of pixels that can be accessed per second and the access time per pixel for raster systems with resolutions of 640x480 and 1280x1024 refreshing at 60Hz.
3) It asks for the raster size in bytes needed to store 4 and 8 bits per pixel for a 1024x1024 raster system.
4) It asks how much time is spent scanning each row of pixels for a 1280x1024 system
1. Magnetic tape drives had a major problem of not allowing random access to stored data, as the tape had to be sequentially scanned to find a particular piece of data.
2. Flash memory uses floating gate transistors that allow the threshold voltage to be changed by adding or removing electrons from the floating gate through tunneling or hot carrier injection, enabling each cell to be programmed and erased individually to store data.
3. The three main types of non-volatile semiconductor memories are ROM, which cannot be programmed after manufacture; EEPROM/flash which can be electrically erased and reprogrammed; and EPROM which requires ultraviolet light for erasure.
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Hsien-Hsin Sean Lee, Ph.D.
This document discusses branch prediction in computer architecture. It begins by explaining what information is predicted for branches - the direction and target. It then categorizes different types of branches and discusses the costs of branch misprediction. Various branch prediction techniques are presented, starting with simple 1-bit and 2-bit predictors, and progressing to more advanced correlating and global history predictors. The goal of branch prediction is to reduce penalties from mispredicted branches by speculatively executing the predicted path.
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
The document describes the ftrace function tracing tool in Linux kernels. It allows attaching to functions in the kernel to trace function calls. It works by having the GCC compiler insert indirect function entry calls. These calls are recorded during linking and replaced with nops at boot time for efficiency. This allows function tracing with low overhead by tracing the indirect function entry calls.
The document provides an introduction to AMD and Intel, two major processor manufacturers. It discusses the companies' websites, CEOs, headquarters and net incomes. Sections are devoted to the pros and cons, products and latest processors from each company. Market share is mentioned, though not detailed. The document ends with a conclusion section, but it is blank. Overall, the document lays out the structure for a comparison of AMD and Intel, but does not provide many details in the body.
UTF-8: The Secret of Character EncodingBert Pattyn
The document discusses character encoding standards like ASCII, UTF-8, and UTF-16. It explains that UTF-8 uses 1-4 bytes per character and has become the standard for XML and web content. The document raises questions about choosing the right encoding based on the characters, software, and browsers used.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
The document introduces Samsung's eMMC memory technology. It summarizes the key features and enhancements of eMMC versions 4.4, 4.41, and 4.5, including improved performance, security, and reliability. Some notable additions in eMMC 4.5 are higher data transfer rates up to 200MHz SDR mode, packed commands to boost I/O performance, cache functionality to reduce write latency, and sanitize feature to securely purge all unused data at once. Sample availability timelines for eMMC 4.5 chips with and without 200MHz support are also provided.
The document compares AMD and Intel processors. It finds that AMD processors are generally cheaper than Intel, with a custom-built AMD PC costing $272.92 compared to $484.90 for Intel. While Intel is better for business and research needs, AMD supports a wider range of uses including gaming, video/audio editing, and movies. AMD also offers better graphics and gaming performance due to its 3D Now! technology and runs programs like 3D Studio Max and Photoshop faster than Intel. However, older AMD processors can overheat quicker than Intel.
Tegra 186 (Tegra-P1 : Pascal GPU搭載のTegra)のu-bootとLinuxについて、
特に、BPMP (Boot and Power Management Processer)に関してです。
About u-boot and Linux of Tegra 186 (Tegra-P1: Tegra with Pascal GPU)
In particular, it is about BPMP (Boot and Power Management Processer).
1. The document describes the process of converting a regular expression (RE) = (ab+c)* to an equivalent deterministic finite automaton (DFA).
2. It starts with the equivalent non-deterministic finite automaton with epsilon transitions (NFA-ε) for the given RE.
3. It then constructs the DFA by calculating the epsilon-closure of states and determining the transitions between states for each symbol in the alphabet.
4. The resulting DFA has 4 states - A, B, C, D and the transition table and diagram are shown.
Snapdragon is a family of mobile systems on a chip (SoC) by Qualcomm. Qualcomm considers Snapdragon a "platform" for use in smartphones, tablets, and smartbook devices.
ARM is a family of RISC-based microprocessors and microcontrollers designed by ARM Inc., Cambridge, England.
ARM chips are high-speed processors that are known for their small die size and low power requirements.
In this deck from ATPESC 2019, James Moawad and Greg Nash from Intel present: FPGAs and Machine Learning.
"Neural networks are inspired by biological systems, in particular the human brain. Through the combination of powerful computing resources and novel architectures for neurons, neural networks have achieved state-of-the-art results in many domains such as computer vision and machine translation. FPGAs are a natural choice for implementing neural networks as they can handle different algorithms in computing, logic, and memory resources in the same device. Faster performance comparing to competitive implementations as the user can hardcore operations into the hardware. Software developers can use the OpenCL device C level programming standard to target FPGAs as accelerators to standard CPUs without having to deal with hardware level design."
Watch the video: https://wp.me/p3RLHQ-lnc
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
and
https://www.intel.com/content/www/us/en/products/programmable/fpga.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Join this video course on Udemy. Click the below link
https://www.udemy.com/mastering-rtos-hands-on-with-freertos-arduino-and-stm32fx/?couponCode=SLIDESHARE
>> The Complete FreeRTOS Course with Programming and Debugging <<
"The Biggest objective of this course is to demystifying RTOS practically using FreeRTOS and STM32 MCUs"
STEP-by-STEP guide to port/run FreeRTOS using development setup which includes,
1) Eclipse + STM32F4xx + FreeRTOS + SEGGER SystemView
2) FreeRTOS+Simulator (For windows)
Demystifying the complete Architecture (ARM Cortex M) related code of FreeRTOS which will massively help you to put this kernel on any target hardware of your choice.
LCU13: Deep Dive into ARM Trusted Firmware
Resource: LCU13
Name: Deep Dive into ARM Trusted Firmware
Date: 31-10-2013
Speaker: Dan Handley / Charles Garcia-Tobin
Low-cost microcontrollers are being used more and more often in embedded applications that previously may have used a microprocessor. Microcontrollers often run a real-time operating system (RTOS) rather than a full operating system like Linux. In this webinar we introduce FreeRTOS, a popular RTOS for microcontrollers that has been ported to 35 microcontroller platforms.
The document discusses the Intel Core microarchitecture. It provides an agenda that covers an introduction, knowledge preparation about key concepts like architecture versus microarchitecture, performance measurements, and pipeline design. It then discusses notable features of the Core microarchitecture and concludes with a microarchitecture tour and considerations for coding.
Branch prediction is necessary to reduce penalties from branches in modern deep pipelines. It predicts the direction (taken or not taken) and target of branches. Common techniques include bimodal prediction using saturating counters and two-level prediction using branch history tables and pattern history tables. Real processors use hybrid predictors combining different techniques. Mispredictions require flushing the pipeline and incur a performance penalty.
Branch prediction is a technique used in modern processors to improve performance. It tries to predict which way a branch will go before it is known for sure. This helps reduce delays in the processor pipeline caused by branches. Common branch prediction techniques include static prediction based on heuristics, saturating counters that track branch outcomes, two-level adaptive prediction using pattern history tables, and predicting branch targets before they are calculated. While branch prediction is effective, some processors use simpler approaches to reduce power and area requirements.
The document discusses the specifications and technologies of the Intel Core i5-3550S processor. It uses the Ivy Bridge microarchitecture with a 22nm process. It has 4 cores with 4 threads each, supports up to 32GB of RAM, and has cache memory including a shared 8MB L3 cache. It supports many Intel technologies like Turbo Boost, Hyper-Threading, Virtualization, and AES-NI.
The document introduces the ONFI 3.0 NAND flash controller standard. It features faster data transfer speeds up to 200MT/s using differential signaling and DDR-2. It allows for dynamically scalable error correction and supports new commands. Arasan provides a fully compliant ONFI 3.0 controller core along with documentation, models, and verification IP to help customers integrate it into their designs.
This document contains 5 questions about raster systems and frame buffers:
1) It asks for the frame buffer size in bytes needed to store 12 and 24 bits per pixel for raster systems with resolutions of 640x480, 1280x1024, and 2560x2048.
2) It asks for the number of pixels that can be accessed per second and the access time per pixel for raster systems with resolutions of 640x480 and 1280x1024 refreshing at 60Hz.
3) It asks for the raster size in bytes needed to store 4 and 8 bits per pixel for a 1024x1024 raster system.
4) It asks how much time is spent scanning each row of pixels for a 1280x1024 system
1. Magnetic tape drives had a major problem of not allowing random access to stored data, as the tape had to be sequentially scanned to find a particular piece of data.
2. Flash memory uses floating gate transistors that allow the threshold voltage to be changed by adding or removing electrons from the floating gate through tunneling or hot carrier injection, enabling each cell to be programmed and erased individually to store data.
3. The three main types of non-volatile semiconductor memories are ROM, which cannot be programmed after manufacture; EEPROM/flash which can be electrically erased and reprogrammed; and EPROM which requires ultraviolet light for erasure.
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Hsien-Hsin Sean Lee, Ph.D.
This document discusses branch prediction in computer architecture. It begins by explaining what information is predicted for branches - the direction and target. It then categorizes different types of branches and discusses the costs of branch misprediction. Various branch prediction techniques are presented, starting with simple 1-bit and 2-bit predictors, and progressing to more advanced correlating and global history predictors. The goal of branch prediction is to reduce penalties from mispredicted branches by speculatively executing the predicted path.
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
The document describes the ftrace function tracing tool in Linux kernels. It allows attaching to functions in the kernel to trace function calls. It works by having the GCC compiler insert indirect function entry calls. These calls are recorded during linking and replaced with nops at boot time for efficiency. This allows function tracing with low overhead by tracing the indirect function entry calls.
The document provides an introduction to AMD and Intel, two major processor manufacturers. It discusses the companies' websites, CEOs, headquarters and net incomes. Sections are devoted to the pros and cons, products and latest processors from each company. Market share is mentioned, though not detailed. The document ends with a conclusion section, but it is blank. Overall, the document lays out the structure for a comparison of AMD and Intel, but does not provide many details in the body.
UTF-8: The Secret of Character EncodingBert Pattyn
The document discusses character encoding standards like ASCII, UTF-8, and UTF-16. It explains that UTF-8 uses 1-4 bytes per character and has become the standard for XML and web content. The document raises questions about choosing the right encoding based on the characters, software, and browsers used.
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
The document introduces Samsung's eMMC memory technology. It summarizes the key features and enhancements of eMMC versions 4.4, 4.41, and 4.5, including improved performance, security, and reliability. Some notable additions in eMMC 4.5 are higher data transfer rates up to 200MHz SDR mode, packed commands to boost I/O performance, cache functionality to reduce write latency, and sanitize feature to securely purge all unused data at once. Sample availability timelines for eMMC 4.5 chips with and without 200MHz support are also provided.
The document compares AMD and Intel processors. It finds that AMD processors are generally cheaper than Intel, with a custom-built AMD PC costing $272.92 compared to $484.90 for Intel. While Intel is better for business and research needs, AMD supports a wider range of uses including gaming, video/audio editing, and movies. AMD also offers better graphics and gaming performance due to its 3D Now! technology and runs programs like 3D Studio Max and Photoshop faster than Intel. However, older AMD processors can overheat quicker than Intel.
Tegra 186 (Tegra-P1 : Pascal GPU搭載のTegra)のu-bootとLinuxについて、
特に、BPMP (Boot and Power Management Processer)に関してです。
About u-boot and Linux of Tegra 186 (Tegra-P1: Tegra with Pascal GPU)
In particular, it is about BPMP (Boot and Power Management Processer).
1. The document describes the process of converting a regular expression (RE) = (ab+c)* to an equivalent deterministic finite automaton (DFA).
2. It starts with the equivalent non-deterministic finite automaton with epsilon transitions (NFA-ε) for the given RE.
3. It then constructs the DFA by calculating the epsilon-closure of states and determining the transitions between states for each symbol in the alphabet.
4. The resulting DFA has 4 states - A, B, C, D and the transition table and diagram are shown.
Snapdragon is a family of mobile systems on a chip (SoC) by Qualcomm. Qualcomm considers Snapdragon a "platform" for use in smartphones, tablets, and smartbook devices.
ARM is a family of RISC-based microprocessors and microcontrollers designed by ARM Inc., Cambridge, England.
ARM chips are high-speed processors that are known for their small die size and low power requirements.
In this deck from ATPESC 2019, James Moawad and Greg Nash from Intel present: FPGAs and Machine Learning.
"Neural networks are inspired by biological systems, in particular the human brain. Through the combination of powerful computing resources and novel architectures for neurons, neural networks have achieved state-of-the-art results in many domains such as computer vision and machine translation. FPGAs are a natural choice for implementing neural networks as they can handle different algorithms in computing, logic, and memory resources in the same device. Faster performance comparing to competitive implementations as the user can hardcore operations into the hardware. Software developers can use the OpenCL device C level programming standard to target FPGAs as accelerators to standard CPUs without having to deal with hardware level design."
Watch the video: https://wp.me/p3RLHQ-lnc
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
and
https://www.intel.com/content/www/us/en/products/programmable/fpga.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Join this video course on Udemy. Click the below link
https://www.udemy.com/mastering-rtos-hands-on-with-freertos-arduino-and-stm32fx/?couponCode=SLIDESHARE
>> The Complete FreeRTOS Course with Programming and Debugging <<
"The Biggest objective of this course is to demystifying RTOS practically using FreeRTOS and STM32 MCUs"
STEP-by-STEP guide to port/run FreeRTOS using development setup which includes,
1) Eclipse + STM32F4xx + FreeRTOS + SEGGER SystemView
2) FreeRTOS+Simulator (For windows)
Demystifying the complete Architecture (ARM Cortex M) related code of FreeRTOS which will massively help you to put this kernel on any target hardware of your choice.
LCU13: Deep Dive into ARM Trusted Firmware
Resource: LCU13
Name: Deep Dive into ARM Trusted Firmware
Date: 31-10-2013
Speaker: Dan Handley / Charles Garcia-Tobin
Low-cost microcontrollers are being used more and more often in embedded applications that previously may have used a microprocessor. Microcontrollers often run a real-time operating system (RTOS) rather than a full operating system like Linux. In this webinar we introduce FreeRTOS, a popular RTOS for microcontrollers that has been ported to 35 microcontroller platforms.
The document discusses the Intel Core microarchitecture. It provides an agenda that covers an introduction, knowledge preparation about key concepts like architecture versus microarchitecture, performance measurements, and pipeline design. It then discusses notable features of the Core microarchitecture and concludes with a microarchitecture tour and considerations for coding.
Branch prediction is necessary to reduce penalties from branches in modern deep pipelines. It predicts the direction (taken or not taken) and target of branches. Common techniques include bimodal prediction using saturating counters and two-level prediction using branch history tables and pattern history tables. Real processors use hybrid predictors combining different techniques. Mispredictions require flushing the pipeline and incur a performance penalty.
Branch prediction is a technique used in modern processors to improve performance. It tries to predict which way a branch will go before it is known for sure. This helps reduce delays in the processor pipeline caused by branches. Common branch prediction techniques include static prediction based on heuristics, saturating counters that track branch outcomes, two-level adaptive prediction using pattern history tables, and predicting branch targets before they are calculated. While branch prediction is effective, some processors use simpler approaches to reduce power and area requirements.
Parallel programming model, language and compiler in ACA.MITS Gwalior
This document discusses parallel programming models and their key aspects. It describes five common parallel programming models: shared-variable, message-passing, data parallel, object-oriented, and functional/logic. The main types of inter-process communication are shared variables and message passing. Synchronous and asynchronous message passing are introduced. The document also covers language features that enable parallel programming such as optimization, availability, synchronization/communication, control of parallelism, data parallelism, and process management.
The document discusses MapReduce runtime environments, including their design, performance optimizations, and applications. It provides an overview of MapReduce, describing the programming model and key-value data processing. It also discusses the design of MapReduce execution runtimes, including their use of distributed file systems and handling of parallelization, load balancing, and failures. Finally, it outlines areas of ongoing research to improve MapReduce performance and applicability.
The document discusses advanced techniques for branch prediction in computer processors. It covers topics like branch speculation, branch direction prediction using static, dynamic, and hybrid approaches, branch target prediction, and reducing branch mispredictions through techniques like two-level adaptive prediction, bi-mode prediction, and path history prediction. The goal is to better predict conditional branch instructions to improve instruction pipeline utilization and overall processor performance.
Branch prediction reduces pipeline stalls by speculatively executing instructions down the predicted path. It involves predicting the branch direction, target address, and in some cases the return address. Common techniques include using saturated counters in the branch history table (BHT) to track past branch behavior, branch target buffers (BTB) to predict the target address, and correlating predictors that track the relationships between recent branches. Evaluations show correlating predictors can achieve over 95% prediction accuracy, significantly reducing branch penalties.
This document discusses methods for conformational analysis of molecules, which is needed to identify a molecule's lowest energy conformation. It describes systematic search methods that systematically explore all torsion angles combinations but are limited by computational time. It also describes model-building methods that construct conformations by joining molecular fragments. Available technologies for conformational analysis include software tools from Accelrys, Molecular Networks, OpenEye, Schrodinger, and Tripos.
This document discusses instruction-level parallelism (ILP) limitations. It covers ILP background using a MIPS example, hardware models that were studied including register renaming and branch/jump prediction assumptions. A study of ILP limitations found diminishing returns with larger window sizes and realizable processors are limited by complexity and power constraints. Simultaneous multithreading was explored as a technique to improve ILP but has its own design challenges. Today, x86 and ARM processors employ various ILP optimizations within pipeline constraints.
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
The document summarizes key aspects of the P6 microarchitecture used in processors like the Pentium Pro, Pentium II, and Pentium III. It describes the system architecture with separate front-side and back-side buses. It then details the instruction fetch, decode, register renaming, out-of-order execution, memory handling, and retirement stages of the processor pipeline. Diagrams illustrate the branch prediction, reservation stations, reorder buffer, and memory order buffer components that enable speculative and out-of-order execution in the P6.
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
We asked LinkedIn members worldwide about their levels of interest in the latest wave of technology: whether they’re using wearables, and whether they intend to buy self-driving cars and VR headsets as they become available. We asked them too about their attitudes to technology and to the growing role of Artificial Intelligence (AI) in the devices that they use. The answers were fascinating – and in many cases, surprising.
This SlideShare explores the full results of this study, including detailed market-by-market breakdowns of intention levels for each technology – and how attitudes change with age, location and seniority level. If you’re marketing a tech brand – or planning to use VR and wearables to reach a professional audience – then these are insights you won’t want to miss.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
This work presents the evaluation of the two classic workstealing algorithms (FIFO and LIFO) together with a new proposed implementation based on the priority of tasks calculated using the longest path as a metric
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
SDN operators need to measure the performance of OF HW switch on their site. Cause there is 1000 times differences in latency, depends on the specified flow entry. ASIC can forward in several μsecs but the software (CPU) may take msec.
To protect yourself from unexpected performance plunge, monitor your switches healthiness on your site.
This document discusses interfacing a hex keypad to an AT89C51 microcontroller. It provides details on the components required, including the microcontroller, keypad, and 7-segment display. It describes how the keypad is scanned using column scanning to detect key presses and determine the pressed key. The programming for the microcontroller is also outlined, showing how it initializes ports, scans the keypad, looks up the pressed key in a lookup table to display on the 7-segment display. Potential issues faced include debouncing and directly powering the display. Future work proposed includes adding an LCD module to create a basic calculator.
This document provides instructions for a lab activity involving configuring networking devices. The activity includes 7 parts where students will physically connect a router, switch, and computers, and then configure the devices by following instructions. The objectives are to practice physical connectivity, establishing console sessions on devices, assigning IP addresses for static routing, and verifying configurations and connectivity. Diagrams and tables are provided to identify device interfaces, cable types, and IP addressing schemes.
The document outlines 17 projects related to real-time monitoring systems for automated production lines and testing equipment. Many of the projects involve implementing monitoring of production metrics like downtime, reject rates, and production yields. Other projects focus on collecting machine data through various interfaces and analyzing the data to troubleshoot equipment and quality issues. Common goals across multiple projects include online monitoring of key metrics, collecting statistical data on performance, and using the data for continuous improvement efforts like lean six sigma.
How to design a Passive Infrared (PIR) Open Source ProjectIonela
This article details the hardware and software required for a fully functional passive infrared (PIR) sensor with an associated remote control unit. The remote control unit adjusts key algorithm detection parameters which are stored in the MC68HC908JK1/3 FLASH memory area.
This document summarizes a project to design a printed circuit board (PCB) for a remote data acquisition node powered over an optical fiber. Key points:
- The goal was to design a PCB that uses power-over-fiber technology to supply power via laser light converted to electricity, replacing traditional copper cables and power supplies.
- The designed PCB includes components like a laser power converter, ADC, FPGA, and optical transmitter. It underwent several revisions to address issues in the prototype.
- The final PCB design is presented, but further testing is needed to fully address open issues like powering the board using just the laser power converter.
The document discusses microprocessors and interrupts in computer systems. It describes how the first microprocessor was developed by Intel and Busicom in 1971. It then covers several Intel microprocessor models from the 4004 to the 8088 and beyond. The document also defines interrupts as signals that cause the CPU to pause its current task and service the interrupt. It distinguishes between maskable, non-maskable, software, and hardware interrupts and provides examples of each. Finally, it discusses the different software interrupts available in the 8085 microprocessor.
This document describes a software-defined radio system called Longear that was developed by Wavenetix over 3 years for high dynamic range signal collection and processing of OFDMA signals. The system uses joint detection and iterative interference cancellation algorithms to distinguish signals that are only a few dB apart. It processes signals in both acquisition and post-processing modes using a multithreaded software architecture. Field tests showed the system could detect signals separated by as little as 5-9 dB without interference cancellation, and up to 30 dB with serial interference cancellation enabled. Wavenetix offers this and other wireless testing technologies and services to equipment manufacturers and wireless operators.
This document describes the design and implementation of a numerically controlled oscillator (NCO) on an FPGA. An NCO generates sinusoidal signals through a phase accumulator and lookup table. The authors implemented an NCO with 9-bit phase resolution, 54.18 dB spur level, 24-bit frequency resolution, and output signals of sine and cosine waves. They designed the NCO in VHDL, simulated it in Xilinx ISE, and tested it on a Spartan-2 FPGA board. The hardware results matched the simulation results, demonstrating frequencies from 1.25 MHz to 7.5 MHz with matching output on a spectrum analyzer. The FPGA-based NCO can be used for applications like software defined
This document provides an overview of reconfigurable computing and field programmable gate arrays (FPGAs). It discusses the history and flexibility advantages of FPGAs compared to application-specific integrated circuits (ASICs) and general purpose processors (GPPs). The document outlines FPGA architecture including logic blocks, interconnect networks, memory and digital signal processing blocks. It also covers FPGA programming technologies, data flow graphs, and considerations for implementing algorithms on FPGAs which requires a codesign approach.
This document provides an overview of QEMU, including its use of dynamic translation and Tiny Code Generator (TCG) to emulate target CPUs on the host system. It discusses how QEMU translates target instructions into a RISC-like intermediate representation (TCG ops), optimizes and converts them to host instructions. The document also mentions Linaro's work with QEMU and a QEMU monitor tool for debugging ARM systems emulated by QEMU.
This project implements FIR filters on an FPGA to process audio signals in real time. The goals are to 1) pass audio frequencies from input to output on the FPGA, 2) incorporate high pass, low pass, and band pass FIR filters, and 3) add more FIR filters of increasing depth to observe performance. The document describes the hardware design including interfaces, a system diagram, and filter implementation. Results show the FIR filters can effectively filter signals as designed and with over 100 filter stages, minimal glitches are observed, demonstrating the FPGA's ability to perform real-time digital signal processing.
Presentation on my undergraduate thesis work at DPLC 2016 ConferenceChennu Vinodh Reddy
Presentation at Design and Manufacturing Product Life Cycle (DPLC 2016) national conference on my undergraduate thesis work on the title "Optimization of Process Parameters in CNC Wire Cut EDM using Taguchi Method for Effective Machining of STAVAX ESR" .
This document describes configuring a basic single-area OSPFv2 network. It includes the topology diagram and addressing tables, and steps to build the network, configure OSPF routing on each router with area 0, and verify OSPF neighbor relationships and routing tables. It also provides sample outputs of show commands to check OSPF settings and interfaces.
The document describes designing and simulating an analog switch in L-Edit and PSpice. Key steps include:
1) Calculating the required widths and lengths of nFET and pFET devices to achieve an on-resistance of 400 ohms or less.
2) Laying out the design in L-Edit according to these dimensions.
3) Performing design rule checks and extracting the layout for PSpice simulation.
4) Simulating the switch in PSpice to verify switching operation, resistance, and bandwidth meet specifications.
This document summarizes key topics from a lecture on IC fabrication:
1. It discusses the history of integrated circuits from the first transistor in 1948 to early microprocessors in the 1970s and Moore's Law predicting increasing transistor counts.
2. It describes the process of fabricating ICs using photolithography to pattern silicon wafers in multiple layers to build up transistor circuits.
3. It outlines trends in IC performance such as increasing frequency and power consumption over time as components shrink according to Moore's Law.
This document summarizes key topics from a lecture on IC fabrication:
1. It discusses the history of integrated circuits from the first transistor in 1948 to early microprocessors in the 1970s and Moore's Law predicting increasing transistor counts.
2. It describes the process of fabricating ICs using photolithography to pattern silicon wafers in multiple layers to build up transistor circuits.
3. It outlines trends in IC performance such as increasing frequency and power consumption over time as components shrink according to Moore's Law.
This document summarizes a reconfigurable system with Linux load on an FPGA. It discusses the software architecture including socket communication, device communication, and architectural layers/details. It then describes the reconfiguration of the FPGA with Linux, including module loading/unloading. Performance results are provided for startup, driver setup, and module/data loading. Finally, future work is discussed around improving algorithms, adding Ethernet support, and a distributed scenario.
Random Number Generators :
LCG, Fibonacci, LFSR, GFSR, TGFSR, MT, MT19937,WELL
Tutorials on FInite Fields and associated RNG on github at :
https://github.com/rinnocente/Random_Numbers
The document discusses Docker containers and their architecture. It begins by explaining that Docker originated as a tool called Docker created by dotCloud to manage customer applications in the cloud. It became very popular with developers and dotCloud changed its name to Docker, Inc. and focused its business on Docker. The document then discusses how Docker uses Linux kernel features like control groups (cgroups) and namespaces to isolate containers and their resources. It explains that Docker architecture includes a client, daemon, containers running applications, and an optional distributed data store. Finally, it provides an example of basic Docker commands to check the Docker version and run a test container.
The document discusses several topics related to open networking with FPGAs including:
1) High Performance Reconfigurable Computing (HPRC) and the performance of FPGAs for networking and computing.
2) A demonstration by Huawei and Xilinx of a 400 Gbps core router implemented on FPGA cards.
3) Interchip and interboard communication technologies like Interlaken.
4) OpenFlow and how it allows a controller to programmatically alter the forwarding behavior of switches.
5) Network emulation tools like Mininet that can be used to test OpenFlow applications and topologies without requiring physical hardware.
The document discusses using Maxwell's equations and the Finite-Difference Time-Domain (FDTD) method to simulate indoor WiFi propagation and determine optimal access point placement, noting that while Maxwell's equations can model electromagnetic propagation, accurately simulating real-world materials, effects like scattering and dispersion, and WiFi signals may be computationally challenging.
This document discusses email authentication techniques including TLS, SPF, DKIM and DMARC. It provides information on how these protocols work and how to implement them. Key points covered include how SPF validates the envelope sender address by checking the authorized mail servers for a domain in DNS, and how DKIM cryptographically signs specific parts of emails to validate that the content has not been modified in transit. Configuration examples are given for setting up SPF records and generating DKIM keys.
This document provides an overview of FPGA computing using an Intel/Altera Arria 10 FPGA. It begins with the history of the FPGA computing project and defines what an FPGA is. It then discusses the Intel Arria 10 FPGA used in tests. To measure performance on new architectures, it introduces the concept of the "Seven Dwarfs" benchmark algorithms and arithmetic intensity. The Roofline performance model is explained as a way to estimate performance based on peak flop rate and memory bandwidth. Actual performance results are shown for algorithms like vector addition, stencil code, and matrix multiplication run on the FPGA compared to CPU. OpenCL is discussed as a programming model for FPGAs.
The document discusses various topics related to refreshing computer skills, including:
1. How to easily write simple web pages using Markdown or Markdown editors.
2. How to easily add math to web pages using MathJax, which allows LaTeX equations to be rendered in browsers.
3. Notebooks for mixing text, math, and computation results using Jupyter notebooks, which support various programming kernels like Python, R, and Maxima.
This document summarizes key aspects of computer hardware and architecture. It discusses nodes and networks that make up computer systems. It describes different types of computer instruction sets like CISC and RISC. It outlines various microarchitecture features for high performance computing like superscalar, pipelining, out-of-order execution, and branch prediction. It also covers cache memory organization and algorithms.
This document discusses transport protocols and how they have been optimized for large data transfers but are not as well suited for the small file transfers that now dominate web traffic. It describes several key aspects of TCP including flow control using a sliding window, congestion control algorithms like slow start and congestion avoidance, and mechanisms for detecting and responding to packet loss like fast retransmit. It notes how TCP was adapted over time, including additions like fast recovery, and alternatives like TCP Vegas which aims to avoid rather than just respond to congestion. The document provides historical context and details on TCP implementations.
The document discusses public key cryptography, digital certificates, and Transport Layer Security (TLS). It provides an overview of symmetric and asymmetric key cryptography, algorithms like RSA and ElGamal, the use of digital signatures and certificates, and how TLS uses public key encryption to securely transmit data over the internet, such as for email.
This document discusses challenges and opportunities for end nodes with multigigabit networking. It covers increasing bandwidth capabilities through technologies like DWDM and 10GbE. It also examines hardware challenges for processor, memory, and I/O buses. Software challenges discussed include zero-copy networking, ULNI/OS bypass, and network path pipelining. The document also summarizes network protocols like AQM, ECN, MPLS and their roles in high-speed networking.
Mosix : automatic load balancing and migration rinnocente
The document discusses automatic load balancing and transparent process migration in distributed operating systems. It begins with an introduction and overview of MOSIX, the distributed operating system that is the focus. It then provides overviews of several other distributed operating systems such as Sprite, Charlotte, Accent, Amoeba, and V System. The document outlines different load balancing mechanisms, including job initiation, migration algorithms, and measures used to evaluate processor load. It also covers process migration mechanisms.
This document discusses various rule mining algorithms. It begins with an introduction to data mining and central themes like classification, clustering, association analysis, outlier analysis, and evolution analysis. It then discusses association rule mining (ARM), including definitions of support, confidence, and how ARM finds frequent itemsets and strong association rules. It also covers quantitative rules mining, sequential mining, partially ordered sets (posets), lattices, common algorithmic families like Apriori and FP-growth, and more.
This document provides an overview of IPv6 including:
- The history and motivations for developing IPv6 due to IPv4 address exhaustion.
- An introduction to IPv6 addressing and prefixes.
- Transition technologies like tunnels to help with gradual IPv6 deployment.
- IPv6 control protocols for tasks like neighbor discovery and routing.
- Details on how IPv6 addresses are represented textually and allocated.
This document discusses various technologies used for network access control including dot1x/RADIUS, DHCP, and their open source and proprietary implementations. It provides configuration examples of dot1x/RADIUS with Cisco and Juniper devices and discusses how to correlate user information from dot1x/RADIUS and DHCP. FreeRADIUS and various open source servers that can be used as alternatives are also listed.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
3. 20 Nov 2005 roberto innocente 3
3
Speculative execution
A prediction of what work is likely to be
needed soon is made. Then it is
speculatively executed in such a way that
you can commit it if the prediction was
correct or abort it.
5. 20 Nov 2005 roberto innocente 5
5
Linear scaling of speed
Quadratic scaling of transistors
Let's look at the last scaling in
silicon litography from 0.13 u
to 0.9 u : a 0.70 linear scaling,
a 0.49 scaling of surface.
Gate delays scale linearly,
transistors available scale
quadratically
We will get much more in
available complexity than
in gate speed
0,00
50,00
100,00
150,00
200,00
250,00
0.25 0.18 0.13 0.09 0.065
Gate Speed
Transistors
7. 20 Nov 2005 roberto innocente 7
7
Modern microprocessors
Today µ processors take advantage of the
fact that they need to present an
architectural state compliant with the
standard von Neumann's model only from
time to time, being for the remaining time
free to proceed in whatever way they find it
convenient
11. 20 Nov 2005 roberto innocente 11
11
Pipelining
The work to be done is
divided in stages, with a
clear signal interface
between them. After each
stage a latch memorizes
the state for the next
cycle. It adds some
overhead, but the hope is
to get 1 result per cycle,
after the pipe is full.
F
eXecute
Memory
WritebackDecode
Fetch
D X
M
W
Pipeline latch
13. 20 Nov 2005 roberto innocente 13
13
Pipeline at work
cycle F D X M W
1 add r1,r3,r4
2 mul r5,r6,r7
3 bnez loop,r1
4 X
5 X X
6 X X X
7 X X X X
8 div r8,r3,r6 X X X X
9 add r10,r8,r9 X X X
10 jmp loop X X
When there is a
dependency we
say that the
pipeline is stalled
or a bubble is
inserted waiting
for the dependency
to solve. Here a
control
dependency causes
a 4 cycles stall.
14. 20 Nov 2005 roberto innocente 14
14
Instruction dependencies
Data dependency :
add r1,r2,r3 ; r1<r2+r3
mul r1,r4,r5 ; r5<r4*r5
Solution:
register renaming,
result forwarding
Structural dependency:
Solution:
add functional units
Control dependency :
bne label1,r1,r2
add r1,r2,r3
label1:
mul r4,r5,r6
Solution:
branch prediction
15. 20 Nov 2005 roberto innocente 15
15
Multiple issue (Superscalar)
Architectures
F D X
M
W
F D X
M
W
Architectures that are able to
process multiple instructions
at a time. While it was
common to have multiple
execution units (like an
integer and a FP unit), only in
the '90 appeared the first
superscalar architectures e.g.
IBM Power and Pentium Pro.
These architectures require a
very good branch prediction.
Here it's depicted a 2 way
superscalar.
16. 20 Nov 2005 roberto innocente 16
16
Superscalar/2
Current architectures are commonly 4 or 8
way superscalars
The design of the last Alpha, canceled in its
late phase, was for an 8 way superscalar
Extremely good branch prediction is
needed : there can be hundredths of
instructions in flight ( 4 way*30 stages=120)
20. 20 Nov 2005 roberto innocente 20
20
Feature size, frequency, complexity
386 486dx P 60 p pro P II P III P 4 P 4 571
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
feat.size
386 486dx P 60 p pro P II P III P 4 P 4 571
0
500
1000
1500
2000
2500
3000
3500
4000
freq
386 486dx P 60 p pro P II P III P 4 P 4 571
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
trans.#
22. 20 Nov 2005 roberto innocente 22
22
Control xfer instructions
Some of the instructions, instead of simply
incrementing the PC to the next instruction,
change it to a different value. We distinguish :
Unconditional branches or simply jumps
Conditional branches or simply branches
subroutine calls
subroutine returns
traps, returns from interrupts or exceptions
27. 20 Nov 2005 roberto innocente 27
27
0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
com
press
gcc
go ijpeg
m
88ksim
perl
vortex
xlisp
Dynamic instructions
Dynamic branches
Dynamic Cond BR
Branches by frequency
SPEC95
Benchmarks
(on yaxis millions
of instruction)
28. 20 Nov 2005 roberto innocente 28
28
Alw ays
taken
1 4 %
9 5 1 0 0 %
2 1 %
5 0 9 5 %
2 0 %
5 5 0 %
2 4 %
0 5 %
7 %
Never
Taken
1 4 %
Alw ays taken
9 5 1 0 0 %
5 0 9 5 %
5 5 0 %
0 5 %
Never Taken
Branches by taken rate
Average from
SPECint95
29. 20 Nov 2005 roberto innocente 29
29
Occurrences of branches
Occurrences of branches (conditional branches) :
SPECint 95 1 out of 5 instruction executed (20%)
SPECfp 95 1 out of 10 instruction executed (10%)
Basic block is the term used for a sequence of
instructions without any control xfer
Note : this is different and much more than the rate of
branches in the static program
30. 20 Nov 2005 roberto innocente 30
30
Mispredictions effects
b=rate of instruction
executed that are
branches (0.10.2)
p=prediction accuracy
(currently the best is in
the 0.900.97 range)
f=instructions “inflight”
(in execution, currently
over 100)
Oversimplification:
misprediction is recognized
only at the very end and
forces to squash all the
following f inflight instr.
Then every 1/(b*(1p)) instr.
we squash f instr.
E = 1/(1+f*(b*(1p))
33. 20 Nov 2005 roberto innocente 33
33
Static branch prediction
Always taken (AT), Always not taken (ANT)
Backward taken, forward not taken(BTFNT)
frequently used by current processors, relies on compilers
too (Intel Pentium4)
Complicate rules : for example the bp of PentiumM looks at
the distance between addresses and opcodes
Programmer hints (special opcodes on Pentium, flags on
Itanium)
program reorganization by compilers
Achieves ~ 60/70 % accuracy
34. 20 Nov 2005 roberto innocente 34
34
Semistatic branch prediction
It relies on data collected from previous runs
of the program (profiling : Sun Sparc)
Insertion in the code of appropriate hints :
predict taken
predict not taken
Achieves accuracy of ~ 65/80 %
35. 20 Nov 2005 roberto innocente 35
35
Dynamic branch prediction
As the last time
Bimodal predictors
Achieve accuracy of 70/85
%
2 level / correlation
predictors
Achieve accuracy of
80/90 %
Combining/Meta predictors
Markov/PPM predictors
Neural predictors
static semi
static
bimod
al
2 level combi
ned
50
55
60
65
70
75
80
85
90
95
% branch pred. accuracy
from
to
36. 20 Nov 2005 roberto innocente 36
36
2bc – Two bit saturated counter
The best 4 states FSA
(Finite State Automaton)
SNT,NT,T,ST (Strongly
NotTaken, NotTaken,
Taken, Strongly Taken)
Add 1 when branch is
taken, subtract 1 when
not taken. Saturate at 0
and 3
ST T
NT SNT
t
t
t
t
nt
nt
nt
0001
1011
nt
t
38. 20 Nov 2005 roberto innocente 38
38
Branch correlations
Global correlation Local correlation
for(i=0;i<1000;i++) {
if (i%4 == 0) a[i]=0;
}
if (cond1) { .. }
if (cond1 && cond2) { .. }
if (cond1) a=2;
..
if (a==0) { .. }
if (cond1) { .. }
if (cond2) { .. }
if (cond1 && cond2) { .. }
for(i=0;i<12;i++) { .. }
Outcome depends on the outcome of
previous branches
Outcome depends on previous
outcomes of same branch
39. 20 Nov 2005 roberto innocente 39
39
Twolevel/Correlation predictor (Yeh
Patt’92,PanSohRameh’92)
Branches are correlated one
to the other
We keep a shift register with
the most recent branch
outcomes
We index a bimodal table
(Pattern History Table) with
this branch history register
(BHT)
We can keep only one global
BHT for all the branches
(global 2level predictor) or a
BHT per each branch (local 2
level predictor). The same we
can do for the PHT.
Branch history
register
Pattern History Table
Prediction
Last
outcome
41. 20 Nov 2005 roberto innocente 41
41
gshare (McFarling ’93)
Alleviates the
problem of PHT
destructive
interference between
branches
The PHT is indexed
with the XOR of the
BHT and the BIA
(branch instruction
address)
Branch history
register
Pattern History Table
Prediction
Last
outcome
XOR
Branch address
43. 20 Nov 2005 roberto innocente 43
43
Tournament/Meta predictor (McFarling
’93)
Often happens that a predictor
is better for some branches
and another for other
branches
A bimodal predictor can then
be used to drive a mux that
will choose between the 2
predictors
When the outcome is known
the metapredictor is updated if
one of the predictors was right
and the other wrong
In this case the states are the
confidence on the 2 predictors
Predictor1 Predictor2
Meta
Predictor
Address of branch
instruction
Mux
Hybrid predictor
outcome
44. 20 Nov 2005 roberto innocente 44
44
Data compression
It is a similar and well studied problem, for which there
exists an algorithm reputed nearly optimal (PPM).
the goal is to represent the data with fewer bits :
You use fewer bits for frequent sequences and more
bits for the infrequent ones. The net effect is to use
less bits overall.
It relies on accurately predicting the probabilistic
distribution of data and using a coder tuned to that
45. 20 Nov 2005 roberto innocente 45
45
Markov predictor
A Markov predictor of
order j, bases its
prediction on the last j
outcomes
It builds the matrix of
transition frequencies
and makes the prediction
according to that
pattern next frequency
00 0
1 1
01 0
1 2
10 0 1
1 1
11 0 2
1
1 0 1 1 0 0 1 1 0 Last outcomes
46. 20 Nov 2005 roberto innocente 46
46
PPM – (Cleary, Witten 1984)
Prediction by Partial Matching
A PPM predictor of order m is a set of m+1
Markov predictors
1 0 1 1 0 0 1 1 0
Last m bits
if found
Predict with
Markov predictor of order m
Last m1 bits
if found
if found
Markov predictor of order m1
Markov predictor of order 1
Markov predictor of order 0if not found
if not found
if not found
47. 20 Nov 2005 roberto innocente 47
47
Neural methodsD.Jimenez 2002
Machine learning has often used neural methods
Most neural networks can't be candidates for
hardware prediction at the microarchitecture level
Their implementation would require much more
than several cycles
The standard method of training, the
backpropagation algorithm, is infeasible in a few
machine cycles
48. 20 Nov 2005 roberto innocente 48
48
Perceptron
Introduced by Rosenblatt in
1962 as a model of brain
functioning, popularized by
M.Minsky
We will consider the simplest:
the singlelayer perceptron
A vector of n inputs: x[1]..x[n]
Each input has a weight
associated with it: w[0]..w[n].
This vector characterizes the
perceptron
49. 20 Nov 2005 roberto innocente 49
49
Bipolar perceptron
The inputs and the outcome t can be only 1
or 1
Then t*x[i] = 1 if they agree, or 1 if they
disagree
if the w[i] are integers, y is an integer too
and sign(y) is the prediction
50. 20 Nov 2005 roberto innocente 50
50
Perceptron training
Simply stated : increase the weights of those inputs
that agree with the outcome, and decrease the weight
of those that do not
Let t be the outcome and θ be a threshold after which we
stop to train the perceptron. Then the algorithm is :
if ((sign(y) <> t)||(|y| < theta)) {
for (i=0 ; i<=n;i++) {
w[i] = w[i] + t * x[i];
}
}
51. 20 Nov 2005 roberto innocente 51
51
Perceptron limitations
A single perceptron can only learn linearly separable
functions of the inputs. The linear equation
w[0]+Σ w[i]*x[i]=0 represents an hyperplane in the
ndim space of inputs
AND, OR, NAND, NOR are linearly separable, XOR is
not
Of course any boolean function can be learned by a 2
layer network of perceptrons (as any boolean function
can be represented by a 2layer net of ANDs and
ORs), but it has been shown that for bp there is not
much gain and the delay gets much worse
52. 20 Nov 2005 roberto innocente 52
52
Branch prediction with perceptrons
The inputs of the perceptron are the branch history
We keep a table of perceptrons (the weights) that we address
hashing on the branch address
Every time we meet a branch we load the perceptron in a vector
register and we compute in parallel the dot product between the
weights and the branch history (summing the complements to 1
instead of those to 2)
According to the result we predict the branch taken or not taken
The training alg. is performed and the updated perceptron is
written back
53. 20 Nov 2005 roberto innocente 53
53
It's the serialization
constraint imposed by data
dependencies among
instructions
Was always thought to be
an insurmountable limit
An instruction that needs
data from another
instruction needs to be
executed after that
The dataflow limit
ADD R1,R2,R3 ; R1<R2+R3
ADD R4,R1,R5 ; R4<R1+R5
54. 20 Nov 2005 roberto innocente 54
54
Exceeding the dataflow limit
At the end of the '90 some authors proposed the
use of data prediction to overcome the dataflow
limit
M.Lipasti, Shen Exceeding the data flow limit
This is much more difficult than branch prediction
where you need to predict only a binary value
56. 20 Nov 2005 roberto innocente 56
56
Value prediction/1
It was shown for instance that 20/30 % of
instructions that write value in registers write
the same value as the last time
And 40/50 % write one of the last 4
preceeding values
57. 20 Nov 2005 roberto innocente 57
57
Value prediction/2
What makes these values so predictable ?
It seems this is due to severe penalties realworld
programs incur not only because they are designed
to manage quite infrequent contingencies like
exceptions and error conditions but because they
are general by design. This is shown even by code
aggressively optimized by modern state of the art
compilers
60. 20 Nov 2005 roberto innocente 60
60
Research areas
Reverse engineering of prediction algorithms
implementations
Simulation of new prediction algorithms :
Using legacy Instructions Sets (IS)
Using abstract RISC instructions sets
Hand code optimization and compiler optimization
techniques
61. 20 Nov 2005 roberto innocente 61
61
Reverse engineering
A python or perl script :
produces assembly language kernels (with for
example fix distance between branch
instructions)
compiles and runs the kernels using the
hardware counters for mispredictions to detect
table sizes, conflicts and so on
62. 20 Nov 2005 roberto innocente 62
62
Legacy IS/OS simulations
Can be obtained instrumenting an x86 open
source simulator like bochs that can run
windows or linux
You can then run statically precompiled
binaries over it
Problem : bochs is not even a complete
Pentium II simulator !
63. 20 Nov 2005 roberto innocente 63
63
Abstract IS simulators
SimpleScalar is an opensource framework for a
generic software simulator over which modules for
different prediction algorithms can be implemented
Offers the possibility to customize also the
Instruction Set (IS)
Problem : you need the source and compile all
special libraries to use this tool
65. 20 Nov 2005 roberto innocente 65
65
Scheduling
Code scheduling or reordering of instruction is used
to improve performance or guarantee correctness
Important for dynamically scheduled architectures,
essential for static scheduled architectures
Examples : branch delay slots, memory delays,
multicycle operations
Block scheduling, List scheduling, Superblock
scheduling, Trace Scheduling
66. 20 Nov 2005 roberto innocente 66
66
BTA era is here
(Billion Transistor Architecture)
Intel Itanium2 with 6MB L3 cache has 0.41 billion transistors of which
around 0.3 billion transistors are for the cache memory
It's not clear what will be the best use of the available silicon:
CMP (SingleChip MultiProcessors)
Superwide superspeculative superscalar
Simultaneous MultiThreading
Raw Processors
67. 20 Nov 2005 roberto innocente 67
67
386 486dx P 60 p pro P II P III P 4 P 4 571
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
FO4 gates
pipe length
feat.size
trans.#
freq