parallelism

PARALELLISM
Executingtwoor more operationsorstreamsof instructionatthe same time is knownas
Parallelism.
The amount of parallelismavailable withinabasicblock—astraight-line code sequencewithno
branchesinexcepttothe entryandno branchesout exceptatthe exit—isquitesmall.Fortypical
MIPS programs,the average dynamicbranchfrequencyisoftenbetween15% and 25%, meaning
that betweenthree andsix instructionsexecute betweenapairof branches.Since these instructions
are likelytodependuponone another,the amountof overlapwe canexploitwithinabasicblockis
likelytobe lessthanthe average basic blocksize.Toobtainsubstantial performance enhancements,
we must exploitILPacrossmultiple basicblocks.The simplestandmostcommonwayto increase
the ILP isto exploitparallelismamongiterationsof aloop.Thistype of parallelismisoftencalled
loop-levelparallelism.Here isasimple example of aloopthat addstwo 1000-elementarraysand is
completelyparallel.
for (i=0; i<=999; i=i+1) x[i] = x[i] +y[i];
Every iterationof the loopcan overlapwithanyotheriteration,althoughwithineachloopiteration
there islittle orno opportunityforoverlap.
GOALS
 The purpose of parallel processingistospeed upthe computerprocessingcapabilityorin
words,itincreasesthe computational speed
 The systemmay have twoor more processorsoperatingconcurrently.
 Improvesthe performance of the computerforagivenclockspeed.
TYPES OF PARALLELISM
1) InstructionLevel Parallelism(ILP)
 Pipelining
 Superscalar
2) Process Level Parallelism(PLP)
 Array Computer
 Multiprocessor
1) INSTRUCTION PIPELINING

 An instructionpipeliningreadsconsecutive instructionsfrommemorywhile previous
instructionsare beingexecutedinothersegments.
 Computerneedstoprocesseachinstructionwiththe followingsequence of steps.
Pipelining Steps
 Fetchthe instructionfrommemory
 Decode the instruction
 Calculate the effective address
 Fetchthe operandsfrommemory
 Execute the instruction
 Store the resultinthe properplace

Flow Diagram
PipeliningConflicts

 Resource conflictscausedbyaccessto memorybytwosegmentsatthe same time.These may
be resolvedbyusingseparate instructionanddatamemories
 Data Dependencyconflictsarise whenaninstructiondependsonthe resultof aprevious
instruction,butthisresultisnotyetavailable.
2) Superscalar execution in whichmultiple executionunitsare usedto
execute multipleinstructionsinparallel.Intypical superscalarprocessors,the instructions
executingsimultaneouslyare adjacentinthe original programorder.
 A superscalarCPUarchitecture implementsaformof parallelismcalledinstruction-level
parallelismwithinasingle processor.
 It therefore allowsfasterCPUthroughputthanwouldotherwise be possible ata givenclock
rate.
 A superscalarprocessorexecutesmore thanone instructionduringaclockcycle by
simultaneouslydispatchingmultiple instructionstoredundantfunctional unitsonthe processor.
 The term superscalar, firstcoinedin1987 referstoa machine that isdesignedtoimprove the
performance of the executionof scalarinstructions.
Super scaler Implementation
 A superscalarimplementationof a processor architecture isone inwhichcommon
instructions—integerandfloating-pointarithmetic,loads,stores,andconditional branches—can
be initiatedsimultaneouslyandexecutedindependently.
Why we use Super scaler?
 CPU hardware dynamicallychecksfordatadependenciesbetweeninstructionsat runtime
(versussoftware checkingatcompile time)
 The CPU acceptsmultiple instructionsperclockcycle.
 The branch instructionprocessing

Super scaler organization
Data Dependences
There are three differenttypesof dependences:data dependences(alsocalledtrue datadependences),
name dependences,andcontrol dependences.Aninstructionj isdatadependentoninstructioni if
eitherof the followingholds:
■ Instructioni producesa resultthatmay be usedby instructionj.
■ Instructionj is data dependentoninstructionk,andinstructionkisdatadependentoninstructioni.

For example
considerthe followingMIPScode sequencethatincrementsavectorof valuesinmemory(startingat
0(R1) and withthe lastelementat8(R2)) by a scalar inregisterF2.(For simplicity,throughoutthis
chapter,our examplesignore the effectsof delayedbranches.)
Loop: L.D F0,0(R1) ;F0=array elementADD.DF4,F0,F2;add scalar inF2 S.D F4,0(R1) ;store resultDADDUI
R1,R1,#-8 ;decrementpointer8bytesBNE R1,R2,LOOP;branch R1!=R2
The data dependencesinthiscode sequenceinvolve bothfloating-pointdata:
and integerdata:
In bothof the above dependentsequences,asshownbythe arrows,eachinstructiondependsonthe
previousone.The arrows here andinfollowingexamplesshow the orderthatmustbe preservedfor
correct execution.The arrowpointsfromaninstructionthatmustprecede the instructionthatthe
arrowheadpointsto.If two instructionsare datadependent,theymustexecute inorderandcannot
execute simultaneouslyorbe completelyoverlapped.The dependence impliesthatthere wouldbe a
chainof one or more data hazards betweenthe twoinstructions.(See Appendix Cfora brief description
of data hazards,whichwe will define preciselyinafew pages.) Executingthe instructionssimultaneously
will cause a processorwithpipelineinterlocks(andapipelinedepthlongerthanthe distance between
the instructionsincycles) todetectahazard and stall,therebyreducingoreliminatingthe overlap.Ina
processorwithoutinterlocksthatreliesoncompilerscheduling,the compilercannotschedule
dependentinstructionsinsucha waythat theycompletelyoverlap,since the programwill notexecute
correctly.The presence of a data dependence inaninstructionsequence reflectsadata dependence in
the source code fromwhichthe instructionsequence wasgenerated.The effectof the original data
dependence mustbe preserved.
Loop: L.D F0,0(R1) ;F0=array elementADD.DF4,F0,F2;add scalar inF2 S.D F4,0(R1) ;store result
DADDIU R1,R1,#-8 ;decrementpointer;8bytes(perDW) BNE R1,R2,Loop ;branch R1!=R2

Name Dependences
The secondtype of dependence isaname dependence.A name dependence occurswhentwo
instructionsuse the same registerormemorylocation,calledaname,butthere isno flow of data
betweenthe instructionsassociatedwiththatname.There are twotypesof name dependences
betweenaninstructioni thatprecedesinstructionj inprogramorder:
1. An antidependence betweeninstructioni andinstructionj occurswheninstructionj writesaregister
or memorylocationthatinstructioni reads.The original orderingmustbe preservedtoensure thati
readsthe correct value.Inthe example onpage 151, there is antidependence betweenS.DandDADDIU
on registerR1.
2. An outputdependence occurswheninstructioni andinstructionj write the same registerormemory
location.The orderingbetweenthe instructions mustbe preservedtoensure thatthe value finally
writtencorrespondstoinstructionj.
Hazards
A hazard existswheneverthere isaname or data dependence betweeninstructions.
Because of the dependence,we mustpreservewhatiscalledprogramorder—thatis,the orderthatthe
instructionswouldexecuteinif executedsequentiallyone ata time as determinedbythe original source
program.The goal of bothour software andhardware techniquesistoexploitparallelismbypreserving
program orderonlywhere itaffectsthe outcome of the program.Detectingandavoidinghazards
ensuresthatnecessaryprogramorderispreserved.Datahazards,whichare informallydescribedin
Appendix C,maybe classifiedasone of three types,dependingonthe orderof read and write accesses
inthe instructions.Byconvention,the hazardsare namedbythe orderinginthe program thatmust be
preservedbythe pipeline.Considertwoinstructionsi andj,withi precedingj inprogram order.The
possible datahazardsare
■ RAW (readafterwrite)—j triestoreadasource before i writesit,soj incorrectlygetsthe oldvalue.
Thishazard isthe mostcommontype and correspondstoa true data dependence.Programordermust
be preservedtoensure thatj receivesthe value fromi.
■ WAW (write afterwrite)—j triestowrite anoperandbefore itiswrittenbyi.The writesendupbeing
performedinthe wrongorder,leavingthe value writtenbyi ratherthanthe value writtenbyj inthe
destination.Thishazardcorrespondstoanoutputdependence.WAWhazardsare presentonlyin
pipelinesthatwrite inmore thanone pipe stage orallow an instructiontoproceedevenwhena
previousinstructionisstalled.
■ WAR (write afterread)—j triestowrite adestinationbefore itisreadbyi,so i incorrectlygetsthe
newvalue.Thishazardarisesfroman antidependence (orname dependence).WARhazardscannot
occur in moststatic issue pipelines— evendeeperpipelinesorfloating-pointpipelines—because all
readsare early.

Control Dependences
The last type of dependence isacontrol dependence.A control dependence determinesthe orderingof
an instruction,i,withrespecttoa branchinstructionsothat instructioni isexecutedincorrectprogram
orderand onlywhenitshouldbe.Everyinstruction,exceptforthose inthe firstbasicblockof the
program,is control dependentonsome setof branches,and,ingeneral,these control dependences
mustbe preservedtopreserveprogramorder.One of the simplestexamplesof acontrol dependence is
the dependence of the statementsinthe “then”partof an if statementonthe branch.For example,in
the code segment
if
p1
{
S1;
};
If
p2
{
S2;
}
S1 is control dependentonp1, andS2 iscontrol dependentonp2but not onp1. Ingeneral,two
constraintsare imposedbycontrol dependences:1.An instructionthatiscontrol dependentona
branch cannotbe movedbefore the branchsothat itsexecutionisnolongercontrolledbythe branch.
For example,we cannottake aninstructionfromthe thenportionof anif statementandmove itbefore
the if statement.2.An instructionthatisnot control dependentona branchcannot be movedafterthe
branch so thatits executioniscontrolledbythe branch.Forexample,we cannottake astatement
before the if statementandmove itintothe thenportion.

Processor Level Parallelism(Machine Parallelism)
 In a multiprocessingsystem,all CPUsmaybe equal,orsome may be reservedforspecial
purposes.
 In multiprocessing,the processorscanbe usedtoexecute asingle sequence of instructionsin
multiple contexts
 Multiprocessingisthe use of twoor more central processingunits(CPUs) withinasingle
computersystem.
 The term alsoreferstothe abilityof asystemto supportmore thanone processorand/orthe
abilitytoallocate tasksbetweenthem.
 Multiprocessingsometimesreferstothe executionof multipleconcurrentsoftware processesin
a systemas opposedtoa single processatany one instant.
 The terms multitaskingormultiprogrammingare more appropriate todescribe thisconcept,
whichisimplementedmostlyinsoftware,whereasmultiprocessingismore appropriate to
describe the use of multiplehardware CPUs.

Amdahl’sLaw
 Amdahl'slaw,alsoknownasAmdahl'sargument,isnamedaftercomputerarchitectGene
Amdahl,andisusedto findthe maximumexpectedimprovementtoanoverall systemwhen
onlypart of the systemisimproved.
 Amdahl'slawstatesthatthe overall speedupof applyingthe improvementwill be.
Old RunningTime = 1
NewRunning Time = (1-P)+P/S

parallelism

Recommended

Recommended

More Related Content

Similar to parallelism

Similar to parallelism (20)

Recently uploaded

Recently uploaded (20)

parallelism