Complete explanation of parallelism instruction level and machine level parallelism
also explanation of its types.
Including amdhal's law and its graphical representation .
just read it you will be able to understand parallelism.
if you have any problem contact me inam.qais@gmail.com.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
The call stack is a data structure that stores information about active subroutines in a computer program. It keeps track of the point to which each subroutine should return control when finished. Subroutines can call other subroutines, resulting in information stacking up on the call stack. The call stack is composed of stack frames containing state information for each subroutine. Debuggers like GDB allow viewing the call stack to see how the program arrived at its current state.
This document discusses instruction level parallelism and techniques for exploiting it. It covers topics like pipelining, instruction dependencies, hazards, and approaches to overcoming limitations on parallelism both through dynamic scheduling in hardware and through static transformations by compilers. Key limitations to parallelism discussed are branches, dependencies between instructions, and pipeline stalls caused by dependencies. The document provides an overview of these core computer architecture concepts.
This document summarizes a research paper about binary obfuscation techniques that aim to make reverse engineering of software more difficult. The paper proposes replacing control transfer instructions like jumps and calls with signals (traps) that are handled by signal handling code to perform the control transfer. It also inserts dummy control transfers and junk instructions after traps to confuse disassemblers. Experimental results show this obfuscation causes disassemblers to miss 30-80% of instructions and make mistakes on over half of control flow edges, while increasing execution time.
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
This PPT contains Data and Resource Dependencies,Control Dependence,Resource Dependence,Bernstein’s Conditions ,Hardware And Software Parallelism,Types of Software Parallelism
This document provides an overview of code scheduling constraints and techniques. It discusses:
- Three types of constraints - control dependence, data dependence, and resource constraints. Changing operation order must not alter program results.
- Data dependence analysis to identify true, anti, and output dependences between memory accesses. This is challenging for arrays and pointers.
- Tradeoffs between register usage and parallelism due to limited register files.
- Supporting speculative execution through prefetching, poison bits, and predicated instructions.
- A basic machine model for representing hardware resources and operation properties like latency.
A Typed Assembly Language for Non-interference.pdfYasmine Anino
This document presents a typed assembly language called SIF for checking non-interference properties of assembly programs. SIF includes pseudo-instructions like cpush and cjmp that impose a stack discipline on the control flow to simulate block structures lost during compilation. The document defines the syntax and type system of SIF, which uses security types and a program counter label to prevent explicit and implicit information flows. Well-typed SIF programs are proved to satisfy non-interference when assembled to untyped machine code.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
The call stack is a data structure that stores information about active subroutines in a computer program. It keeps track of the point to which each subroutine should return control when finished. Subroutines can call other subroutines, resulting in information stacking up on the call stack. The call stack is composed of stack frames containing state information for each subroutine. Debuggers like GDB allow viewing the call stack to see how the program arrived at its current state.
This document discusses instruction level parallelism and techniques for exploiting it. It covers topics like pipelining, instruction dependencies, hazards, and approaches to overcoming limitations on parallelism both through dynamic scheduling in hardware and through static transformations by compilers. Key limitations to parallelism discussed are branches, dependencies between instructions, and pipeline stalls caused by dependencies. The document provides an overview of these core computer architecture concepts.
This document summarizes a research paper about binary obfuscation techniques that aim to make reverse engineering of software more difficult. The paper proposes replacing control transfer instructions like jumps and calls with signals (traps) that are handled by signal handling code to perform the control transfer. It also inserts dummy control transfers and junk instructions after traps to confuse disassemblers. Experimental results show this obfuscation causes disassemblers to miss 30-80% of instructions and make mistakes on over half of control flow edges, while increasing execution time.
advanced computer architesture-conditions of parallelismPankaj Kumar Jain
This PPT contains Data and Resource Dependencies,Control Dependence,Resource Dependence,Bernstein’s Conditions ,Hardware And Software Parallelism,Types of Software Parallelism
This document provides an overview of code scheduling constraints and techniques. It discusses:
- Three types of constraints - control dependence, data dependence, and resource constraints. Changing operation order must not alter program results.
- Data dependence analysis to identify true, anti, and output dependences between memory accesses. This is challenging for arrays and pointers.
- Tradeoffs between register usage and parallelism due to limited register files.
- Supporting speculative execution through prefetching, poison bits, and predicated instructions.
- A basic machine model for representing hardware resources and operation properties like latency.
A Typed Assembly Language for Non-interference.pdfYasmine Anino
This document presents a typed assembly language called SIF for checking non-interference properties of assembly programs. SIF includes pseudo-instructions like cpush and cjmp that impose a stack discipline on the control flow to simulate block structures lost during compilation. The document defines the syntax and type system of SIF, which uses security types and a program counter label to prevent explicit and implicit information flows. Well-typed SIF programs are proved to satisfy non-interference when assembled to untyped machine code.
This document summarizes three types of program slicing: static slicing, thin slicing, and dynamic slicing. Static slicing finds all statements that might affect a variable's value for any input, which can result in large slices. Thin slicing improves on static slicing by only including "producer statements" that directly compute or copy a value to the variable of interest. Dynamic slicing considers the actual input used and only includes statements that were executed and actually affected the variable's value. The document provides definitions, algorithms, examples, and comparisons of the three slicing techniques.
SigFree is a proposed signature-free method to detect and block code-injection buffer overflow attacks by analyzing software code security without virus signatures. It works as an application layer blocker between a protected server and firewall. SigFree distills all possible instruction sequences from request payloads and analyzes them using code abstraction techniques to determine if executable code is present, blocking the request if so. The authors claim SigFree can block new and unknown attacks, is transparent to servers, and has low maintenance costs, making it suitable for large-scale Internet deployment.
Architecture of a morphological malware detectorUltraUploader
This document proposes an architecture for a morphological malware detector that combines syntactic and semantic analysis. It builds an efficient signature matching engine using tree automata techniques to represent control flow graphs (CFG). It also describes a graph rewriting engine to handle common malware mutations. The detector extracts CFGs from malware binaries to generate signatures, which are compiled into a minimal automaton database for efficient matching. Experiments showed promising results with a low false positive rate.
The document provides an overview of functional programming in JavaScript. It discusses key functional programming concepts like pure functions, referential transparency, and higher-order functions. It also covers functional techniques like mapping, filtering, reducing, and recursion that are commonly used in functional programming. The document uses examples with Lodash functions to demonstrate how these concepts and techniques can be implemented in JavaScript.
This document summarizes key topics from Chapter 5 of a book on designing embedded systems with PIC microcontrollers, including:
- Visualizing programs with flow diagrams and state diagrams
- Using program branching, subroutines, and delays
- Implementing logical instructions and look-up tables
- Optimizing assembler code and using advanced simulator features like breakpoints and timing measurements
This document provides an introduction to instruction-level parallel (ILP) processors. It discusses how ILP processors improve performance by executing multiple instructions in parallel through techniques like pipelining and superscalar execution. It also covers dependencies between instructions like data dependencies, control dependencies, and resource dependencies that limit parallelism. The document discusses approaches for instruction scheduling used by compilers and processors to detect and resolve dependencies to expose more instruction-level parallelism. It notes that while ILP processors can provide significant speedups for scientific programs, dependencies limit speedups for general-purpose programs to around 2-4 times.
The document discusses Spark streaming and machine learning concepts like logistic regression, linear regression, and clustering algorithms. It provides code examples in Scala and Python showing how to perform binary classification on streaming data using Spark MLlib. Links and documentation are referenced for setting up streaming machine learning pipelines to train models on streaming data in real-time and make predictions.
This document discusses code obfuscation techniques for protecting software from reverse engineering. It begins with an abstract discussing the use of code obfuscation to protect proprietary algorithms and keys from extraction during reverse engineering. It then provides definitions of code obfuscation and discusses classifications of obfuscation techniques including layout, data, control, and preventive obfuscations. The document surveys various code obfuscation techniques from literature and evaluates them based on criteria like potency, resilience, cost, and resistance to static and dynamic attacks. It concludes with a discussion of empirical evaluation of obfuscation techniques.
A fast static analysis approach to detect exploit code inside network flowsUltraUploader
This document proposes a static analysis approach to detect exploit code within network flows. It aims to distinguish between data and executable code by looking for evidence of meaningful data and control flow patterns within binary code fragments recovered through disassembly, without knowing the exact location of the code. The approach is evaluated on real network trace data and is shown to detect a variety of exploit code, including polymorphic and metamorphic variants. It also automatically generates precise signatures to complement signature-based detection systems.
This document summarizes a research paper that presents a scheme called Hydan for steganographically embedding information in x86 program binaries. The scheme defines sets of functionally equivalent instructions and uses a key-derived selection process to encode bits by choosing the appropriate instructions from each set. Testing showed an encoding rate of about 1 bit for every 110 bits of program code. This capacity could be used for watermarking, fingerprinting executables, or encoding a digital signature in the program binaries. The document reviews related work in classical and code steganography and information hiding.
The document discusses object oriented programming concepts. It describes key object oriented programming concepts like encapsulation, inheritance, polymorphism, and message passing. It also discusses benefits of object oriented programming like modularity, reusability and extensibility. Some disadvantages discussed are compiler overhead, runtime overhead and need for reorientation of developers to object oriented thinking. The document also covers C++ programming concepts like data types, control flow, functions, arrays, strings, pointers and storage classes.
This is the presentation of my training program for junior managers. The program is based on a model that encompasses diversified areas of a supervisor or junior manager - from setting goals to their execution and to the development of their people.
Here I have discussed models of parallel systems, criteria for Parallel programming model, computations in parallel programming, Parallelization of programms, levels of parallelism, parallelism in those levels, Static Scheduling, Dynamic Scheduling, explicit and implicit representation of parallelism ect
This article demonstrates capabilities of the static code analysis methodology. The readers are offered to study the samples of one hundred errors found in open-source projects in C/C++. All the errors have been found with the PVS-Studio static code analyzer.
The document discusses code generation which involves mapping intermediate code to machine code. It describes three key issues in code generator design: instruction selection which determines the best machine instructions to use, register allocation which assigns variables to registers, and evaluation order which determines the order of instructions. The document outlines three algorithms for code generation that involve partitioning code into basic blocks, performing intra-block optimizations, and code selection and assignment.
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
In this paper, we have proposed a novel architectural
technique which can be used to boost performance of modern
day processors. It is especially useful in certain code constructs
like small loops and try-catch blocks. The technique is aimed
at improving performance by reducing the number of
instructions that need to enter the pipeline itself. We also
demonstrate its working in a scalar pipelined soft-core
processor developed by us. Lastly, we present how a superscalar
microprocessor can take advantage of this technique and
increase its performance.
This document summarizes three types of program slicing: static slicing, thin slicing, and dynamic slicing. Static slicing finds all statements that might affect a variable's value for any input, which can result in large slices. Thin slicing improves on static slicing by only including "producer statements" that directly compute or copy a value to the variable of interest. Dynamic slicing considers the actual input used and only includes statements that were executed and actually affected the variable's value. The document provides definitions, algorithms, examples, and comparisons of the three slicing techniques.
SigFree is a proposed signature-free method to detect and block code-injection buffer overflow attacks by analyzing software code security without virus signatures. It works as an application layer blocker between a protected server and firewall. SigFree distills all possible instruction sequences from request payloads and analyzes them using code abstraction techniques to determine if executable code is present, blocking the request if so. The authors claim SigFree can block new and unknown attacks, is transparent to servers, and has low maintenance costs, making it suitable for large-scale Internet deployment.
Architecture of a morphological malware detectorUltraUploader
This document proposes an architecture for a morphological malware detector that combines syntactic and semantic analysis. It builds an efficient signature matching engine using tree automata techniques to represent control flow graphs (CFG). It also describes a graph rewriting engine to handle common malware mutations. The detector extracts CFGs from malware binaries to generate signatures, which are compiled into a minimal automaton database for efficient matching. Experiments showed promising results with a low false positive rate.
The document provides an overview of functional programming in JavaScript. It discusses key functional programming concepts like pure functions, referential transparency, and higher-order functions. It also covers functional techniques like mapping, filtering, reducing, and recursion that are commonly used in functional programming. The document uses examples with Lodash functions to demonstrate how these concepts and techniques can be implemented in JavaScript.
This document summarizes key topics from Chapter 5 of a book on designing embedded systems with PIC microcontrollers, including:
- Visualizing programs with flow diagrams and state diagrams
- Using program branching, subroutines, and delays
- Implementing logical instructions and look-up tables
- Optimizing assembler code and using advanced simulator features like breakpoints and timing measurements
This document provides an introduction to instruction-level parallel (ILP) processors. It discusses how ILP processors improve performance by executing multiple instructions in parallel through techniques like pipelining and superscalar execution. It also covers dependencies between instructions like data dependencies, control dependencies, and resource dependencies that limit parallelism. The document discusses approaches for instruction scheduling used by compilers and processors to detect and resolve dependencies to expose more instruction-level parallelism. It notes that while ILP processors can provide significant speedups for scientific programs, dependencies limit speedups for general-purpose programs to around 2-4 times.
The document discusses Spark streaming and machine learning concepts like logistic regression, linear regression, and clustering algorithms. It provides code examples in Scala and Python showing how to perform binary classification on streaming data using Spark MLlib. Links and documentation are referenced for setting up streaming machine learning pipelines to train models on streaming data in real-time and make predictions.
This document discusses code obfuscation techniques for protecting software from reverse engineering. It begins with an abstract discussing the use of code obfuscation to protect proprietary algorithms and keys from extraction during reverse engineering. It then provides definitions of code obfuscation and discusses classifications of obfuscation techniques including layout, data, control, and preventive obfuscations. The document surveys various code obfuscation techniques from literature and evaluates them based on criteria like potency, resilience, cost, and resistance to static and dynamic attacks. It concludes with a discussion of empirical evaluation of obfuscation techniques.
A fast static analysis approach to detect exploit code inside network flowsUltraUploader
This document proposes a static analysis approach to detect exploit code within network flows. It aims to distinguish between data and executable code by looking for evidence of meaningful data and control flow patterns within binary code fragments recovered through disassembly, without knowing the exact location of the code. The approach is evaluated on real network trace data and is shown to detect a variety of exploit code, including polymorphic and metamorphic variants. It also automatically generates precise signatures to complement signature-based detection systems.
This document summarizes a research paper that presents a scheme called Hydan for steganographically embedding information in x86 program binaries. The scheme defines sets of functionally equivalent instructions and uses a key-derived selection process to encode bits by choosing the appropriate instructions from each set. Testing showed an encoding rate of about 1 bit for every 110 bits of program code. This capacity could be used for watermarking, fingerprinting executables, or encoding a digital signature in the program binaries. The document reviews related work in classical and code steganography and information hiding.
The document discusses object oriented programming concepts. It describes key object oriented programming concepts like encapsulation, inheritance, polymorphism, and message passing. It also discusses benefits of object oriented programming like modularity, reusability and extensibility. Some disadvantages discussed are compiler overhead, runtime overhead and need for reorientation of developers to object oriented thinking. The document also covers C++ programming concepts like data types, control flow, functions, arrays, strings, pointers and storage classes.
This is the presentation of my training program for junior managers. The program is based on a model that encompasses diversified areas of a supervisor or junior manager - from setting goals to their execution and to the development of their people.
Here I have discussed models of parallel systems, criteria for Parallel programming model, computations in parallel programming, Parallelization of programms, levels of parallelism, parallelism in those levels, Static Scheduling, Dynamic Scheduling, explicit and implicit representation of parallelism ect
This article demonstrates capabilities of the static code analysis methodology. The readers are offered to study the samples of one hundred errors found in open-source projects in C/C++. All the errors have been found with the PVS-Studio static code analyzer.
The document discusses code generation which involves mapping intermediate code to machine code. It describes three key issues in code generator design: instruction selection which determines the best machine instructions to use, register allocation which assigns variables to registers, and evaluation order which determines the order of instructions. The document outlines three algorithms for code generation that involve partitioning code into basic blocks, performing intra-block optimizations, and code selection and assignment.
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...IDES Editor
In this paper, we have proposed a novel architectural
technique which can be used to boost performance of modern
day processors. It is especially useful in certain code constructs
like small loops and try-catch blocks. The technique is aimed
at improving performance by reducing the number of
instructions that need to enter the pipeline itself. We also
demonstrate its working in a scalar pipelined soft-core
processor developed by us. Lastly, we present how a superscalar
microprocessor can take advantage of this technique and
increase its performance.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
2. PARALELLISM
Executingtwoor more operationsorstreamsof instructionatthe same time is knownas
Parallelism.
The amount of parallelismavailable withinabasicblock—astraight-line code sequencewithno
branchesinexcepttothe entryandno branchesout exceptatthe exit—isquitesmall.Fortypical
MIPS programs,the average dynamicbranchfrequencyisoftenbetween15% and 25%, meaning
that betweenthree andsix instructionsexecute betweenapairof branches.Since these instructions
are likelytodependuponone another,the amountof overlapwe canexploitwithinabasicblockis
likelytobe lessthanthe average basic blocksize.Toobtainsubstantial performance enhancements,
we must exploitILPacrossmultiple basicblocks.The simplestandmostcommonwayto increase
the ILP isto exploitparallelismamongiterationsof aloop.Thistype of parallelismisoftencalled
loop-levelparallelism.Here isasimple example of aloopthat addstwo 1000-elementarraysand is
completelyparallel.
for (i=0; i<=999; i=i+1) x[i] = x[i] +y[i];
Every iterationof the loopcan overlapwithanyotheriteration,althoughwithineachloopiteration
there islittle orno opportunityforoverlap.
GOALS
The purpose of parallel processingistospeed upthe computerprocessingcapabilityorin
words,itincreasesthe computational speed
The systemmay have twoor more processorsoperatingconcurrently.
Improvesthe performance of the computerforagivenclockspeed.
TYPES OF PARALLELISM
1) InstructionLevel Parallelism(ILP)
Pipelining
Superscalar
2) Process Level Parallelism(PLP)
Array Computer
Multiprocessor
1) INSTRUCTION PIPELINING
3. An instructionpipeliningreadsconsecutive instructionsfrommemorywhile previous
instructionsare beingexecutedinothersegments.
Computerneedstoprocesseachinstructionwiththe followingsequence of steps.
Pipelining Steps
Fetchthe instructionfrommemory
Decode the instruction
Calculate the effective address
Fetchthe operandsfrommemory
Execute the instruction
Store the resultinthe properplace
5. Resource conflictscausedbyaccessto memorybytwosegmentsatthe same time.These may
be resolvedbyusingseparate instructionanddatamemories
Data Dependencyconflictsarise whenaninstructiondependsonthe resultof aprevious
instruction,butthisresultisnotyetavailable.
2) Superscalar execution in whichmultiple executionunitsare usedto
execute multipleinstructionsinparallel.Intypical superscalarprocessors,the instructions
executingsimultaneouslyare adjacentinthe original programorder.
A superscalarCPUarchitecture implementsaformof parallelismcalledinstruction-level
parallelismwithinasingle processor.
It therefore allowsfasterCPUthroughputthanwouldotherwise be possible ata givenclock
rate.
A superscalarprocessorexecutesmore thanone instructionduringaclockcycle by
simultaneouslydispatchingmultiple instructionstoredundantfunctional unitsonthe processor.
The term superscalar, firstcoinedin1987 referstoa machine that isdesignedtoimprove the
performance of the executionof scalarinstructions.
Super scaler Implementation
A superscalarimplementationof a processor architecture isone inwhichcommon
instructions—integerandfloating-pointarithmetic,loads,stores,andconditional branches—can
be initiatedsimultaneouslyandexecutedindependently.
Why we use Super scaler?
CPU hardware dynamicallychecksfordatadependenciesbetweeninstructionsat runtime
(versussoftware checkingatcompile time)
The CPU acceptsmultiple instructionsperclockcycle.
The branch instructionprocessing
6. Super scaler organization
Data Dependences
There are three differenttypesof dependences:data dependences(alsocalledtrue datadependences),
name dependences,andcontrol dependences.Aninstructionj isdatadependentoninstructioni if
eitherof the followingholds:
■ Instructioni producesa resultthatmay be usedby instructionj.
■ Instructionj is data dependentoninstructionk,andinstructionkisdatadependentoninstructioni.
7. For example
considerthe followingMIPScode sequencethatincrementsavectorof valuesinmemory(startingat
0(R1) and withthe lastelementat8(R2)) by a scalar inregisterF2.(For simplicity,throughoutthis
chapter,our examplesignore the effectsof delayedbranches.)
Loop: L.D F0,0(R1) ;F0=array elementADD.DF4,F0,F2;add scalar inF2 S.D F4,0(R1) ;store resultDADDUI
R1,R1,#-8 ;decrementpointer8bytesBNE R1,R2,LOOP;branch R1!=R2
The data dependencesinthiscode sequenceinvolve bothfloating-pointdata:
and integerdata:
In bothof the above dependentsequences,asshownbythe arrows,eachinstructiondependsonthe
previousone.The arrows here andinfollowingexamplesshow the orderthatmustbe preservedfor
correct execution.The arrowpointsfromaninstructionthatmustprecede the instructionthatthe
arrowheadpointsto.If two instructionsare datadependent,theymustexecute inorderandcannot
execute simultaneouslyorbe completelyoverlapped.The dependence impliesthatthere wouldbe a
chainof one or more data hazards betweenthe twoinstructions.(See Appendix Cfora brief description
of data hazards,whichwe will define preciselyinafew pages.) Executingthe instructionssimultaneously
will cause a processorwithpipelineinterlocks(andapipelinedepthlongerthanthe distance between
the instructionsincycles) todetectahazard and stall,therebyreducingoreliminatingthe overlap.Ina
processorwithoutinterlocksthatreliesoncompilerscheduling,the compilercannotschedule
dependentinstructionsinsucha waythat theycompletelyoverlap,since the programwill notexecute
correctly.The presence of a data dependence inaninstructionsequence reflectsadata dependence in
the source code fromwhichthe instructionsequence wasgenerated.The effectof the original data
dependence mustbe preserved.
Loop: L.D F0,0(R1) ;F0=array elementADD.DF4,F0,F2;add scalar inF2 S.D F4,0(R1) ;store result
DADDIU R1,R1,#-8 ;decrementpointer;8bytes(perDW) BNE R1,R2,Loop ;branch R1!=R2
8. Name Dependences
The secondtype of dependence isaname dependence.A name dependence occurswhentwo
instructionsuse the same registerormemorylocation,calledaname,butthere isno flow of data
betweenthe instructionsassociatedwiththatname.There are twotypesof name dependences
betweenaninstructioni thatprecedesinstructionj inprogramorder:
1. An antidependence betweeninstructioni andinstructionj occurswheninstructionj writesaregister
or memorylocationthatinstructioni reads.The original orderingmustbe preservedtoensure thati
readsthe correct value.Inthe example onpage 151, there is antidependence betweenS.DandDADDIU
on registerR1.
2. An outputdependence occurswheninstructioni andinstructionj write the same registerormemory
location.The orderingbetweenthe instructions mustbe preservedtoensure thatthe value finally
writtencorrespondstoinstructionj.
Hazards
A hazard existswheneverthere isaname or data dependence betweeninstructions.
Because of the dependence,we mustpreservewhatiscalledprogramorder—thatis,the orderthatthe
instructionswouldexecuteinif executedsequentiallyone ata time as determinedbythe original source
program.The goal of bothour software andhardware techniquesistoexploitparallelismbypreserving
program orderonlywhere itaffectsthe outcome of the program.Detectingandavoidinghazards
ensuresthatnecessaryprogramorderispreserved.Datahazards,whichare informallydescribedin
Appendix C,maybe classifiedasone of three types,dependingonthe orderof read and write accesses
inthe instructions.Byconvention,the hazardsare namedbythe orderinginthe program thatmust be
preservedbythe pipeline.Considertwoinstructionsi andj,withi precedingj inprogram order.The
possible datahazardsare
■ RAW (readafterwrite)—j triestoreadasource before i writesit,soj incorrectlygetsthe oldvalue.
Thishazard isthe mostcommontype and correspondstoa true data dependence.Programordermust
be preservedtoensure thatj receivesthe value fromi.
■ WAW (write afterwrite)—j triestowrite anoperandbefore itiswrittenbyi.The writesendupbeing
performedinthe wrongorder,leavingthe value writtenbyi ratherthanthe value writtenbyj inthe
destination.Thishazardcorrespondstoanoutputdependence.WAWhazardsare presentonlyin
pipelinesthatwrite inmore thanone pipe stage orallow an instructiontoproceedevenwhena
previousinstructionisstalled.
■ WAR (write afterread)—j triestowrite adestinationbefore itisreadbyi,so i incorrectlygetsthe
newvalue.Thishazardarisesfroman antidependence (orname dependence).WARhazardscannot
occur in moststatic issue pipelines— evendeeperpipelinesorfloating-pointpipelines—because all
readsare early.
9. Control Dependences
The last type of dependence isacontrol dependence.A control dependence determinesthe orderingof
an instruction,i,withrespecttoa branchinstructionsothat instructioni isexecutedincorrectprogram
orderand onlywhenitshouldbe.Everyinstruction,exceptforthose inthe firstbasicblockof the
program,is control dependentonsome setof branches,and,ingeneral,these control dependences
mustbe preservedtopreserveprogramorder.One of the simplestexamplesof acontrol dependence is
the dependence of the statementsinthe “then”partof an if statementonthe branch.For example,in
the code segment
if
p1
{
S1;
};
If
p2
{
S2;
}
S1 is control dependentonp1, andS2 iscontrol dependentonp2but not onp1. Ingeneral,two
constraintsare imposedbycontrol dependences:1.An instructionthatiscontrol dependentona
branch cannotbe movedbefore the branchsothat itsexecutionisnolongercontrolledbythe branch.
For example,we cannottake aninstructionfromthe thenportionof anif statementandmove itbefore
the if statement.2.An instructionthatisnot control dependentona branchcannot be movedafterthe
branch so thatits executioniscontrolledbythe branch.Forexample,we cannottake astatement
before the if statementandmove itintothe thenportion.
10. Processor Level Parallelism(Machine Parallelism)
In a multiprocessingsystem,all CPUsmaybe equal,orsome may be reservedforspecial
purposes.
In multiprocessing,the processorscanbe usedtoexecute asingle sequence of instructionsin
multiple contexts
Multiprocessingisthe use of twoor more central processingunits(CPUs) withinasingle
computersystem.
The term alsoreferstothe abilityof asystemto supportmore thanone processorand/orthe
abilitytoallocate tasksbetweenthem.
Multiprocessingsometimesreferstothe executionof multipleconcurrentsoftware processesin
a systemas opposedtoa single processatany one instant.
The terms multitaskingormultiprogrammingare more appropriate todescribe thisconcept,
whichisimplementedmostlyinsoftware,whereasmultiprocessingismore appropriate to
describe the use of multiplehardware CPUs.