Computer architecture

COMPUTER
ARCHITECTURE
Oversimplified by Arki-Tehcs
Prabhanshu Katiyar- 190050088
Sibasis Nayak - 190050115
Gurnoor Singh - 190050045
Paarth Jain - 190050076
Sahasra Ranjan - 190050102

A for Amdahl’s Law
What they teach: Oversimplified:
In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives
the theoretical speedup in latency of the execution of a task at fixed workload that can
be expected of a system whose resources are improved. It is named after computer
scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer
Conference in 1967. Amdahl's law is often used in parallel computing to predict the
theoretical speedup when using multiple processors. For example, if a program needs 20
hours to complete using a single thread, but a one-hour portion of the program cannot be
parallelized, therefore only the remaining 19 hours (p = 0.95) of execution time can be
parallelized, then regardless of how many threads are devoted to a parallelized execution
of this program, the minimum execution time cannot be less than one hour. Hence, the
theoretical speedup is limited to at most 20 times the single thread performance. Amdahl's
law is often conflated with the law of diminishing returns, whereas only a special case of
applying Amdahl's law demonstrates law of diminishing returns. If one picks optimally (in
terms of the achieved speedup) what is to be improved, then one will see monotonically
decreasing improvements as one improves. If, however, one picks non-optimally, after
improving a sub-optimal component and moving on to improve a more optimal
component, one can see an increase in the return. Note that it is often rational to improve
a system in an order that is "non-optimal" in this sense, given that some improvements are
more difficult or require larger development time than others. Amdahl's law does represent
the law of diminishing returns if on considering what sort of return one gets by adding more
processors to a machine, if one is running a fixed-size computation that will use all
available processors to their capacity. Each new processor added to the system will add
less usable power than the previous one. Each time one doubles the number of processors
the speedup ratio will diminish, as the total throughput heads toward the limit of 1/(1 − p).

B for Branch Predictors
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–
else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the
instruction pipeline. Branch predictors play a critical role in achieving high effective performance in many modern
pipelined microprocessor architectures such as x86. Two-way branching is usually implemented with a conditional jump
instruction. A conditional jump can either be "not taken" and continue execution with the first branch of code which
follows immediately after the conditional jump, or it can be "taken" and jump to a different place in program memory
where the second branch of code is stored. It is not known for certain whether a conditional jump will be taken or not
taken until the condition has been calculated and the conditional jump has passed the execution stage in the
instruction pipeline. Without branch prediction, the processor would have to wait until the conditional jump instruction
has passed the execute stage before the next instruction can enter the fetch stage in the pipeline. The branch predictor
attempts to avoid this waste of time by trying to guess whether the conditional jump is most likely to be taken or not
taken. The branch that is guessed to be the most likely is then fetched and speculatively executed. If it is later detected
that the guess was wrong, then the speculatively executed or partially executed instructions are discarded and the
pipeline starts over with the correct branch, incurring a delay. The time that is wasted in case of a branch misprediction
is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors
tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles. As a result, making
a pipeline longer increases the need for a more advanced branch predictor. The first time a conditional jump
instruction is encountered, there is not much information to base a prediction on. But the branch predictor keeps
records of whether branches are taken or not taken. When it encounters a conditional jump that has been seen several
times before, then it can base the prediction on the history. The branch predictor may, for example, recognize that the
conditional jump is taken more often than not, or that it is taken every second time. Static prediction is the simplest
branch prediction technique because it does not rely on information about the dynamic history of code executing.
Instead, it predicts the outcome of a branch based solely on the branch instruction. The early implementations of
SPARC and MIPS (two of the first commercial RISC architectures) used single-direction static branch prediction: they
always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when
the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential
address. Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the
branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any
taken branch. Both architectures define branch delay slots in order to utilize these fetched instructions. A more
advanced form of static prediction presumes that backward branches will be taken and that forward branches will not.
A backward branch is one that has a target address that is lower than its own address. This technique can help with
prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not
taken. Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction
should be taken or not taken. The Intel Pentium 4 accepts branch prediction hints, but this feature was abandoned in
later Intel processors. Static prediction is used as a fall-back technique in some processors with dynamic branch
prediction when dynamic predictors do not have sufficient information to use. Both the Motorola MPC7450 and the Intel
Pentium 4 use this technique as a fall-back.

C for Caches
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to
reduce the average cost (time or energy) to access data from the main memory. A cache is a
smaller, faster memory, located closer to a processor core, which stores copies of the data
from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache
levels (L1, L2, often L3, and rarely even L4), with separate instruction-specific and data-specific
caches at level 1. Other types of caches exist (that are not counted towards the "cache size" of
the most important caches mentioned above), such as the translation lookaside buffer (TLB)
which is part of the memory management unit (MMU) which most CPUs have. Most modern
desktop and server CPUs have at least three independent caches: an instruction cache to
speed up executable instruction fetch, a data cache to speed up data fetch and store, and a
translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for
both executable instructions and data. A single TLB can be provided for access to both
instructions and data, or a separate Instruction TLB (ITLB) and data TLB (DTLB) can be provided.
The data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.; see also
multi-level caches below). However, the TLB cache is part of the memory management unit
(MMU) and not directly related to the CPU caches. Data is transferred between memory and
cache in blocks of fixed size, called cache lines or cache blocks. When a cache line is copied
from memory into the cache, a cache entry is created. The cache entry will include the
copied data as well as the requested memory location (called a tag). When the processor
needs to read or write a location in memory, it first checks for a corresponding entry in the
cache. The cache checks for the contents of the requested memory location in any cache
lines that might contain that address. If the processor finds that the memory location is in the
cache, a cache hit has occurred. However, if the processor does not find the memory location
in the cache, a cache miss has occurred. In the case of a cache hit, the processor
immediately reads or writes the data in the cache line. For a cache miss, the cache allocates a
new entry and copies data from main memory, then the request is fulfilled from the contents of
the cache.

D for Direct Mapped Cache
In this cache organization, each location in
main memory can go in only one entry in the
cache. Therefore, a direct-mapped cache can
also be called a "one-way set associative"
cache. It does not have a placement policy as
such, since there is no choice of which cache
entry's contents to evict. This means that if two
locations map to the same entry, they may
continually knock each other out. Although
simpler, a direct-mapped cache needs to be
much larger than an associative one to give
comparable performance, and it is more
unpredictable. Let x be block number in
cache, y be block number of memory, and n
be number of blocks in cache, then mapping is
done with the help of the equation x = y mod n.
◦ Each address has a fixed Line it can
belong to according to its index,
basically:

E for Empirical Evaluation
We look for two major values. Latency and Bandwidth. Latency is the
time for each instruction and bandwidth is the number of instructions
per unit time. In general, it is hard to improve on latency because the
speed of light delay cannot be reduced, or you can say “You cannot
bribe god”. On the other hand, bandwidth, also known as throughput
can be improved by spending more money. Amdahl’s law as taught
before is one way to measure improvement achieved by making
certain changes. Another way is benchmarks. Benchmarks are a set of
instructions which are used to stress test the CPU and measure its
performance. Various benchmarks are available such as spec,
Cloudsuite and parsec. Each benchmark is different and suited for
different goals. A major issue with benchmarks is that they may be
outdated and are often not good representative. For example, a CPU
designed to perform well on memory instructions at the cost of poor
performance on arithmetic will perform terribly on a benchmark
containing mostly arithmetic instructions. Also, a CPU might perform well
on one app but poor on another. In such cases, AM of execution times
is a bad idea as it leads to contradictory comparisons. HM or GM
usually work well. The power consumption and carbon emission is also a
key parameter we need to keep in mind while evaluating/rating a CPU.

F for Fully Associative Cache
A fully associative cache contains a single set
with B ways, where B is the number of blocks. A
memory address can map to a block in any of
these ways. A fully associative cache is another
name for a B-way set associative cache with
one set. A fully associative cache permits data
to be stored in any cache block, instead of
forcing each memory address into one
particular block. When data is fetched from
memory, it can be placed in any unused block
of the cache. This way we’ll never have a
conflict between two or more memory
addresses which map to a single cache block.
If all the blocks are already in use, it’s usually
best to replace the least recently used one,
assuming that if it hasn’t used it in a while, it
won’t be needed again anytime soon.
◦ No concept of indices, entire cache
belongs to everyone. Blocks be like:

G for Good Job so Far
What they think I mean: What I really mean:
◦ We have learnt so many concepts so
far in a very simple way. We surely
deserve a break on this slide, pat our
backs for making it this far in this
Computer Architecture crash course
and prepare for the upcoming topics.
I don’t even know why are you still
reading this, you were supposed to
move to the next slide right away
because who even stops to read
unimportant long paragraphs
I couldn’t find a suitable concept for
the letter G, so lets do something
different here. This is a *different* kinda
assignment anyway.

H for Hazards
In the domain of central processing unit (CPU) design, hazards
are problems with the instruction pipeline in CPU
microarchitectures when the next instruction cannot execute in
the following clock cycle,[1] and can potentially lead to
incorrect computation results. Three common types of hazards
are data hazards, structural hazards, and control hazards
(branching hazards).[2] There are several methods used to deal
with hazards, including pipeline stalls/pipeline bubbling, operand
forwarding, and in the case of out-of-order execution, the
scoreboarding method and the Tomasulo algorithm. Data
hazards occur when instructions that exhibit data dependence
modify data in different stages of a pipeline. Ignoring potential
data hazards can result in race conditions (also termed race
hazards). There are three situations in which a data hazard can
occur: RAW, WAW, WAR. A structural hazard occurs when two (or
more) instructions that are already in pipeline need the same
resource. The result is that instruction must be executed in series
rather than parallel for a portion of pipeline. Structural hazards
are sometime referred to as resource hazards. Control hazard
occurs when the pipeline makes wrong decisions on branch
prediction and therefore brings instructions into the pipeline that
must subsequently be discarded. The term branch hazard also
refers to a control hazard.
◦ Control Hazards Structural Hazards

I for Interrupts
In digital computers, an interrupt is a response by the processor to an event that needs attention from
the software. An interrupt condition alerts the processor and serves as a request for the processor to
interrupt the currently executing code when permitted, so that the event can be processed in a timely
manner. If the request is accepted, the processor responds by suspending its current activities, saving
its state, and executing a function called an interrupt handler (or an interrupt service routine, ISR) to
deal with the event. This interruption is temporary, and, unless the interrupt indicates a fatal error, the
processor resumes normal activities after the interrupt handler finishes. Interrupts are commonly used
by hardware devices to indicate electronic or physical state changes that require attention. Interrupts
are also commonly used to implement computer multitasking, especially in real-time computing.
Systems that use interrupts in these ways are said to be interrupt-driven. Interrupt signals may be issued
in response to hardware or software events. These are classified as hardware interrupts or software
interrupts, respectively. For any particular processor, the number of interrupt types is limited by the
architecture. A hardware interrupt is a condition related to the state of the hardware that may be
signaled by an external hardware device, e.g., an interrupt request (IRQ) line on a PC, or detected by
devices embedded in processor logic (e.g., the CPU timer in IBM System/370), to communicate that
the device needs attention from the operating system (OS)[3] or, if there is no OS, from the "bare-
metal" program running on the CPU. Such external devices may be part of the computer (e.g., disk
controller) or they may be external peripherals. For example, pressing a keyboard key or moving a
mouse plugged into a PS/2 port triggers hardware interrupts that cause the processor to read the
keystroke or mouse position. Hardware interrupts can arrive asynchronously with respect to the
processor clock, and at any time during instruction execution. Consequently, all hardware interrupt
signals are conditioned by synchronizing them to the processor clock, and acted upon only at
instruction execution boundaries. A software interrupt is requested by the processor itself upon
executing particular instructions or when certain conditions are met. Every software interrupt signal is
associated with a particular interrupt handler. A software interrupt may be intentionally caused by
executing a special instruction which, by design, invokes an interrupt when executed. Such instructions
function similarly to subroutine calls and are used for a variety of purposes, such as requesting
operating system services and interacting with device drivers (e.g., to read or write storage media).
Software interrupts may also be unexpectedly triggered by program execution errors. These interrupts
typically are called traps or exceptions. For example, a divide-by-zero exception will be "thrown" (a
software interrupt is requested) if the processor executes a divide instruction with divisor equal to zero.
Typically, the operating system will catch and handle this exception.

J for Jump Instructions
In a CPU, the general flow of control is that an
instruction is executed and the PC
automatically moves to the next instruction in
the code. The jump instruction, however breaks
this standard behavior and allows the PC to
jump to specified location (within a maximum
distance from the current PC. The utility of this
instruction is that it allows function calls. Some
variants of it like jump and link are usually used
for function calls as the PC needs to return to
the main function after completing the
function call. The jump instruction usually takes
one parameter, which is the offset from the
current PC. Therefore, the new PC is given by
PC = PC + offset. The offset is usually restricted
to some maximum value as the entire
instruction needs to be fitted in 32 or 64 bits.
Literally, that’s all:

K for Kernel Mode of CPU
The system starts in kernel mode when it boots and after the
operating system is loaded, it executes applications in user
mode. There are some privileged instructions that can only be
executed in kernel mode. These are interrupt instructions, input
output management etc. If the privileged instructions are
executed in user mode, it is illegal and a trap is generated. The
mode bit is set to 0 in the kernel mode. It is changed from 0 to 1
when switching from kernel mode to user mode. In kernel mode,
the CPU may perform any operation allowed by its architecture;
any instruction may be executed, any I/O operation initiated,
any area of memory accessed, and so on. In the other CPU
modes, certain restrictions on CPU operations are enforced by
the hardware. Typically, certain instructions are not permitted
(especially those—including I/O operations—that could alter the
global state of the machine), some memory areas cannot be
accessed, etc. User-mode capabilities of the CPU are typically a
subset of those available in kernel mode, but in some cases, such
as hardware emulation of non-native architectures, they may be
significantly different from those available in standard kernel
mode.
Now CPU be like: You dare oppose me mortal

L for LRU Policy
In computing, cache algorithms (also frequently called cache
replacement algorithms or cache replacement policies) are optimizing
instructions, or algorithms, that a computer program or a hardware-
maintained structure can utilize in order to manage a cache of
information stored on the computer. Caching improves performance
by keeping recent or often-used data items in memory locations that
are faster or computationally cheaper to access than normal memory
stores. When the cache is full, the algorithm must choose which items to
discard to make room for the new ones. Discards the least recently
used items first. This algorithm requires keeping track of what was used
when, which is expensive if one wants to make sure the algorithm
always discards the least recently used item. General implementations
of this technique require keeping "age bits" for cache-lines and track
the "Least Recently Used" cache-line based on age-bits. In such an
implementation, every time a cache-line is used, the age of all other
cache-lines changes. LRU is actually a family of caching algorithms with
members including 2Q by Theodore Johnson and Dennis Shasha, and
LRU/K by Pat O'Neil, Betty O'Neil and Gerhard Weikum. LRU, like many
other replacement policies, can be characterized using a state
transition field in a vector space, which decides the dynamic cache
state changes similar to how an electromagnetic field determines the
movement of a charged particle placed in it.

M for Moore’s law
Moore's law is the observation that the number of transistors in a dense
integrated circuit (IC) doubles about every two years. Moore's law is an
observation and projection of a historical trend. Rather than a law of
physics, it is an empirical relationship linked to gains from experience in
production. The observation is named after Gordon Moore, the co-
founder of Fairchild Semiconductor and Intel (and former CEO of the
latter), who in 1965 posited a doubling every year in the number of
components per integrated circuit, and projected this rate of growth
would continue for at least another decade. In 1975, looking forward to
the next decade, he revised the forecast to doubling every two years,
a compound annual growth rate (CAGR) of 41%. While Moore did not
use empirical evidence in forecasting that the historical trend would
continue, his prediction held since 1975 and has since become known
as a "law". Moore's prediction has been used in the semiconductor
industry to guide long-term planning and to set targets for research and
development, thus functioning to some extent as a self-fulfilling
prophecy. Advancements in digital electronics, such as the reduction
in quality-adjusted microprocessor prices, the increase in memory
capacity (RAM and flash), the improvement of sensors, and even the
number and size of pixels in digital cameras, are strongly linked to
Moore's law. These step changes in digital electronics have been a
driving force of technological and social change, productivity, and
economic growth.

N for NOPS Instruction
In computer science, a NOP, no-op, or NOOP (pronounced "no
op"; short for no operation) is a machine language instruction
and its assembly language mnemonic, programming language
statement, or computer protocol command that does nothing.
Some computer instruction sets include an instruction whose
explicit purpose is to not change the state of any of the
programmer-accessible registers, status flags, or memory. It often
takes a well-defined number of clock cycles to execute. In other
instruction sets, there is no explicit NOP instruction, but the
assembly language mnemonic NOP represents an instruction
which acts as a NOP. A NOP must not access memory, as that
could cause a memory fault or page fault. A NOP is most
commonly used for timing purposes, to force memory alignment,
to prevent hazards, to occupy a branch delay slot, to render
void an existing instruction such as a jump, as a target of an
execute instruction, or as a place-holder to be replaced by
active instructions later on in program development (or to
replace removed instructions when reorganizing would be
problematic or time-consuming). In some cases, a NOP can
have minor side effects; for example, on the Motorola 68000
series of processors, the NOP opcode causes a synchronization
of the pipeline.
You see what we did here? Very few people get
this

O for Optimization
Even though we have achieved a lot of speedup
recently, there is still scope for improvement. Though
improvements in caches and other structures of CPU
also have a significant impact on performance, the
biggest impact is seen in branch predictors, especially
when the predictor is already very good. Consider a
predictor with 98% accuracy which is improved to 99%
accuracy. This may look like a negligible improvement
but in reality, it is huge. This essentially drops the
number of mispredictions to half, which when
calculated is a major speedup. Similar thing will
happen if we improve the accuracy from 99.98% to
99.99. There are other places also which have a scope
of improvement. The simple pipeline structure assumes
1 cycle is needed to fetch data from memory, which is
often not true. It may take tens or hundreds of cycles
which is very inefficient and good caches and cache
replacement policies can significantly improve
performance here as well.

P for Pipelined processor
What they teach:
Oversimplified:
In computer science, instruction pipelining is a technique for
implementing instruction-level parallelism within a single processor.
Pipelining attempts to keep every part of the processor busy with some
instruction by dividing incoming instructions into a series of sequential
steps (the eponymous "pipeline") performed by different processor units
with different parts of instructions processed in parallel. In a pipelined
computer, instructions flow through the central processing unit (CPU) in
stages. For example, it might have one stage for each step of the von
Neumann cycle: Fetch the instruction, fetch the operands, do the
instruction, write the results. A pipelined computer usually has "pipeline
registers" after each stage. These store information from the instruction
and calculations so that the logic gates of the next stage can do the
next step. This arrangement lets the CPU complete an instruction on
each clock cycle. It is common for even numbered stages to operate
on one edge of the square-wave clock, while odd-numbered stages
operate on the other edge. This allows more CPU throughput than a
multicycle computer at a given clock rate, but may increase latency
due to the added overhead of the pipelining process itself. Also, even
though the electronic logic has a fixed maximum speed, a pipelined
computer can be made faster or slower by varying the number of
stages in the pipeline. With more stages, each stage does less work,
and so the stage has fewer delays from the logic gates and could run
at a higher clock rate. For the purpose of this course, we consider the 5
stage pipeline whose stages are Instruction Fetch(IF), Instruction
Decode(ID), Execute(EX), Memory(MEM) and Write-Back (WB).

Q for ….. ummm…. How do I explain?
You know what? Lets make Q for Questions. If you have any questions so far, feel free to
ping any of us on MS Teams and we will try our best to resolve your doubts as soon as
possible. ☺

R for Read-stall
A program often needs to read data from
memory which generally takes a lot of time.
Even with caches, the higher level caches still
take significantly large amount of time to bring
the data to the CPU. During this time, a simple
pipelined CPU is just stalled, executing NOP
instructions. These stalls are called read stalls. A
counter part for write called write stalls also
exists. The instruction which has issued a read
instruction must wait for it to finish, and it often
takes few tens to few hundreds of cycles. The
exact amount depends on a variety of factors
such as the cache miss rate, the miss penalty at
each level, and also the type of program. A
cache thrashing program will generally have a
large number of read stall cycles.

S for SPEC Benchmark
The SPEC Benchmark is one of the most popular
benchmark tests used for evaluating performance
of CPU. SPEC stands for Standard Performance
Evaluation Corporation. The benchmarks aim to
test "real-life" situations. There are several
benchmarks testing Java scenarios, from simple
computation (SPECjbb) to a full system with Java
EE, database, disk, and network (SPECjEnterprise).
The SPEC CPU suites test CPU performance by
measuring the run time of several programs such
as the compiler GCC, the chemistry program
gamess, and the weather program WRF. The
various tasks are equally weighted; no attempt is
made to weight them based on their perceived
importance. An overall score is based on a
geometric mean. Apart from this, various other
benchmarks are also available for evaluating
performance of a cpu.

T for Trap Instructions
A trap instruction is a procedure call that
synchronously transfers the control. It is a software
interrupt generated by the user program or by an
error when the operating system is needed by it to
perform the system calls or an operation. Thus, a
trap instruction used to switch from the user mode
of the system to the kernel mode. A trap is also
generated during context switch between various
processes by the OS. During a trap, the privilege
level of CPU is raised, and it is setup by the OS to
run OS code. For example, the stack changes from
user stack to kernel stack, CPU is granted access to
several protected data structures hidden from
users and the PC now points to some OS code,
depending upon the reason for generation of trap,
and the arguments passed to it. In a nutshell, trap is
responsible for handling all abnormal behavior.

U for Unconditional Branches
The flow of program is “go to the next
instruction” for most of the time during
execution. However, branch instructions break
this general flow. There are two types of
branches, conditional and unconditional.
Conditional branches check the truth value of
some condition and jump or don’t jump based
on that value. Unconditional branches or
unconditional jumps are essentially the
branches which are always taken, or in other
words, the branch instructions whose next PC is
fixed and independent of the state of CPU (i.e.
the values in the registers). These are usually
used for making function calls and its variants
such as Jump and Link instruction are used to
jump to a segment and then return from it after
it is done.

V for Virtual Memory
In computing, virtual memory, or virtual storage is a memory
management technique that provides an "idealized abstraction of the
storage resources that are actually available on a given machine"
which "creates the illusion to users of a very large (main) memory". The
computer's operating system, using a combination of hardware and
software, maps memory addresses used by a program, called virtual
addresses, into physical addresses in computer memory. Main storage,
as seen by a process or task, appears as a contiguous address space
or collection of contiguous segments. The operating system manages
virtual address spaces and the assignment of real memory to virtual
memory. Address translation hardware in the CPU, often referred to as
a memory management unit (MMU), automatically translates virtual
addresses to physical addresses. Software within the operating system
may extend these capabilities, utilizing, e.g., disk storage, to provide a
virtual address space that can exceed the capacity of real memory
and thus reference more memory than is physically present in the
computer. The primary benefits of virtual memory include freeing
applications from having to manage a shared memory space, ability to
share memory used by libraries between processes, increased security
due to memory isolation, and being able to conceptually use more
memory than might be physically available, using the technique of
paging or segmentation.

W for Write-back
In the 5 staged pipeline, the final stage
is the WB or the write-back stage. The
job of this stage is to take the output
from the ALU or the Memory unit,
depending upon the type of
instruction, and writing the value into
the target register as specified in the
instruction. The decision between ALU
or MEM is made using a MUX after the
latch register.

X for ….. You know right?
I am out of ideas now. How about X for Xtra Questions? 

Y for Yahoo! We are almost done!
What I want to say: Oversimplified:
As we arrive on letter Y, its not hard to see
that we are nearing the end of this crash
course, and there is just one more slide to
go. Honestly, there is nothing more to say
here. We couldn’t find anything for the
letter Y either, so this slide is just random
text from here on, because we have to fill
this side of the slide entirely in order to
maintain consistency throughout the
slides. I don’t know, congratulations I
guess? For making it through the entire
course. I think Y for Ye hamari pawri ho
rahi hai. ☺

Z for Zero Register
The zero register is the special register
whose value is hardwired to store zero.
This register is often used for comparisons
with zero in branch instructions, or simply
use zero anywhere. This makes the value
zero easily accessible at all times, without
needing to load it into some temporary
register. This is also used in NOP instruction
and although the exact instruction may
vary across architectures, ‘add $0 $0 $0’
can be used as the NOP instruction.

We are evolving, just like the CPUs
• For the screen lovers, we have created a
telegram chat bot (coz why not?).
• Here is the link: CA Simplified bot
• Here is the link to video demo of the bot
(Available in our submission drive folder as well):
Demo Video

And finally, Thanks for reading 
Merci. धन्यवाद| Shukriya. Gracias.
Shukran. Xièxiè. Abhari Ahe.
Thaagatchari. Terima kasih. Nandri.
Dhanyavaadaalu. Anugrihtaasmi.
Dhonnobad. 감 사 해 요 . Teşekkürler.
Dankie. Takk. 感謝. Grazie. Tatenda.
Asante. Ďakujem. Kösz. Au Kun. Met
Dank. Vd’aka. Choukran. Bohoma
Istuti. teşekkür ederim. Hvala. Npezié.
ευχαριστώ. Doh Jeh. Go raibh maith
agat. ਧੰਨਵਾਦ.
Thanks, Thanks, And Thanks. <3

Computer architecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Computer architecture

Similar to Computer architecture (20)

Recently uploaded

Recently uploaded (20)

Computer architecture