The document discusses computer bus systems and protocols. It describes how the CPU communicates with memory and I/O devices through a bus. The bus provides an interface for this communication and defines protocols like the four-cycle handshake protocol. It also discusses bus operations like reading and writing, bus protocols, and how devices can initiate direct memory access transfers without involving the CPU.
1. Embedded Computing Systems
Unit – III
Text Book Used:
Wayne Wolf: Computers as Components,
Principles of Embedded Computing Systems
Design, 2nd Edition, Elsevier, 2008.
By
Dr. K. Satyanarayan Reddy
CiTECH, B’lore-36.
2. Bus-Based Computer Systems
THE CPU BUS: A computer system
comprises of the CPU; it also
includes memory and I/O
devices.
The bus is the mechanism by
which the CPU communicates
with Memory and Devices.
A Bus is, at a minimum, a
collection of wires, but the bus
also defines a protocol by which
the CPU, memory, and devices
communicate.
One of the major roles of the bus
is to provide an interface to
memory.
Bus Protocols: The basic building
block of most bus protocols is
the Four-cycle Handshake, as
shown in adjacent Figure :
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 2
The Four-cycle Handshake
3. Bus-Based Computer Systems cont’d….
Bus Protocols cont’d.:
1. Device 1 raises its output to signal an enquiry, which tells
device 2 that it should get ready to listen for data.
2. When device 2 is ready to receive, it raises its output to signal
an acknowledgment.
At this point, devices 1 and 2 can transmit or receive.
3. Once the data transfer is complete, device 2 lowers its output,
signaling that it has received the data.
4. After seeing that ack has been released, device 1 lowers its
output.
At the end of the handshake, both handshaking signals are low,
just as they were at the start of the handshake.
The system has thus returned to its original state in readiness
for another handshake-enabled data transfer.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 3
4. The term bus is used in 2 ways. A set of related wires, such as data/address wires also the term
may mean a protocol for communicating between components.
To avoid confusion, the term bundle will be used to refer to a set of related signals.
The fundamental bus operations are READING and WRITING.
Figure below shows the structure of a typical bus that supports reads and writes.
The major components follow:
■ Clock provides synchronization to the bus components,
■ R/W is true when the bus is reading and false when the bus is writing,
■ Address is an a-bit bundle of signals that transmits the address for an access,
■ Data is an n-bit bundle of signals that can carry data to or from the CPU, and
■ Data ready signals when the values on the data bundle are valid.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 4
Bus-Based Computer Systems cont’d….
A typical Microprocessor Bus
5. All transfers on this basic bus are controlled by the CPU, which can read or
write a device or memory, but devices or memory cannot initiate a transfer
on their own.
This is reflected by the fact that R/W and Address are unidirectional signals,
since only the CPU can determine the address and direction of the transfer.
The behavior of a bus is specified with a Timing Diagram, which shows how the
signals on a bus change over time, but since values like the address and data
can take on many values, some standard notation is used to describe signals,
as shown in Figure below:
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 5
Bus-Based Computer Systems cont’d….
6. A’s value is known at all times, so it is shown as a standard waveform that changes
between 0 and 1. B and C alternate between changing and stable states.
A stable signal has, as the name implies, a stable value that could be measured by an
oscilloscope.
e.g.: An address bus may be shown as stable when the address is present, but the
bus’s timing requirements are independent of the exact address on the bus.
A signal can go between a known 0/1 state and a stable/changing state.
A changing signal does not have a stable value. Changing signals should not be used
for computation.
To be sure that signals go to their proper values at the proper times, timing diagrams
sometimes show Timing Constraints.
The Timing Constraints are drawn in two different ways, depending on the amount of
time between events or on the order of events.
e.g.: The timing constraint from A to B, shows that A must go high before B becomes
stable.
The constraint from A to B also has a time value of 10 ns, indicating that A goes
high at least 10 ns before B goes stable.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 6
Bus-Based Computer Systems cont’d….
7. The adjacent figure shows a timing
diagram for the example bus.
The diagram shows a Read and a Write.
Timing Constraints are shown only for the
Read operation, but similar
constraints apply to the write
operation.
The bus is normally in the read mode
since that does not change the state
of any of the devices or memories.
Note: The direction of data transfer on
bidirectional lines is not specified in
the timing diagram.
During a read, the external device or
memory is sending a value on the
data lines, while during a write the
CPU is controlling the data lines.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 7
Timing Diagram for the Bus
Bus-Based Computer Systems cont’d….
8. The sequence of operations for a READ on the Timing Diagram as
follows:
■ A read or write is initiated by setting address enable high after the
clock starts to rise.
Setting R/W = 1 to indicate a read, and the address lines are set to
the desired address.
■ After 1 clock cycle, the memory or device is expected to assert the
data value at that address on the data lines.
Simultaneously, the external device specifies that the data are valid
by pulling down the data ready line.
This line is active low, meaning that a logically true value is indicated
by a low voltage, in order to provide increased immunity to
electrical noise.
■ The CPU is free to remove the address at the end of the clock cycle
and must do so before the beginning of the next cycle.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 8
Bus-Based Computer Systems cont’d….
9. The Handshake that tells the CPU and
Devices when data are to be
transferred is formed by data ready
for the acknowledge side, but is
implicit for the enquiry side.
Since the bus is normally in read mode,
“enq” does not need to be asserted,
but the “acknowledge” must be
provided by Data Ready.
The Data Ready signal allows the bus to
be connected to devices that are
slower than the bus.
As shown in adjacent Figure, the external
device need not immediately assert
data ready.
The cycles between the minimum time at
which data can be asserted and when
it is actually asserted are known as
Wait States. Wait states are
commonly used to connect slow,
inexpensive memories to buses.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 9
A wait state on a read operation
Bus-Based Computer Systems cont’d….
10. The bus handshaking signals can also be
used to perform Burst Transfers, as
illustrated in Figure on right.
In this Burst Read Transaction, the CPU
sends one address but receives a
sequence of data values.
Here an extra line is added to the bus,
called Burst9, which signals when a
transaction is actually a burst.
Releasing the burst9 signal tells the
device that enough data has been
transmitted.
To stop receiving data after the end of
data 4, the CPU releases the burst9
signal at the end of data 3 since the
device requires some time to
recognize the end of the burst.
Those values come from successive
memory locations starting at the
given address.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 10
A burst read transaction
Bus-Based Computer Systems cont’d….
11. Some buses provide Disconnected Transfers.
In these buses, the request and response are
separate.
A first operation requests the transfer.
The bus can then be used for other
operations.
The transfer is completed later, when the data
are ready.
The state machine view of the bus transaction
is also helpful and a useful complement to
the timing diagram.
Figure on right shows the CPU and device
state machines for the read operation.
As with a timing diagram, not all the possible
values of address and data lines are
shown, instead transitions of control
signals are dealt with.
When the CPU decides to perform a read
transaction, it moves to a new state,
sending bus signals that cause the device
to behave appropriately.
The device’s state transition graph captures its
side of the protocol.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 11
State diagrams for the bus read transaction
Bus-Based Computer Systems cont’d….
12. Some buses have Data Bundles that are smaller
than the word size of the CPU, thus using
fewer data lines reduces the cost of the
chip.
Byte addresses are sequentially sent over the
bus, receiving one byte at a time; the bytes
are assembled inside the CPU’s bus logic
before being presented to the CPU proper.
Some buses use multiplexed address and data.
As shown in Figure on right, additional control
lines are provided to tell whether the value
on the address/data lines is an address or
data.
Typically, the address comes first on the
combined address/data lines, followed by
the data.
The address can be held in a register until the
data arrive so that both can be presented
to the device (such as a RAM) at the same
time.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 12
Bus signals for multiplexing address and data
Bus-Based Computer Systems cont’d….
13. Direct Memory Access (DMA)
Standard bus transactions require the CPU to be in the middle of
every read and write transaction.
However, there are certain types of data transfers in which the CPU
does not need to be involved.
e.g.: A high-speed I/O device may wish to transfer a block of data
into memory.
This capability requires that some unit other than the CPU, to be
able to control operations on the bus.
Direct memory access (DMA) is a bus operation that allows reads
and writes not controlled by the CPU.
A DMA transfer is controlled by a DMA controller, which requests
control of the bus from the CPU.
After gaining control, the DMA controller performs read and write
operations directly between devices and memory.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 13
14. Figure below shows the configuration of a bus with a DMA controller.
The DMA requires the CPU to provide two additional bus signals:
■ The bus request is an input to the CPU through which DMA
controllers ask for ownership of the bus.
■ The bus grant signals that the bus has been granted to the
DMA controller.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 14
Direct Memory Access cont’d….
15. A device that can initiate its own bus transfer is known as a Bus Master.
The DMA controller uses bus request & bus grant signals to gain control of
the bus using a classic four-cycle handshake.
The bus request is asserted by the DMA controller when it wants to control
the bus, and the bus grant is asserted by the CPU when the bus is ready.
The CPU will finish all pending bus transactions before granting control of the
bus to the DMA controller.
When it does grant control, it stops driving the other bus signals: R/W,
address, and so on.
Upon becoming Bus Master, the DMA controller has control of all bus signals
and it can perform reads and writes using the same bus protocol as with
any CPU-driven bus transaction.
Memory and devices do not know whether a read or write is performed by
the CPU or by a DMA controller.
After the transaction is finished, the DMA controller returns the bus to the
CPU by de-asserting the bus request, causing the CPU to de-assert the bus
grant.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 15
Direct Memory Access cont’d….
16. The CPU controls the DMA operation through registers
in the DMA controller.
A typical DMA controller includes the following three
registers:
■ A starting address register specifies where the
transfer is to begin.
■ A length register specifies the number of words
to be transferred.
■ A status register allows the DMA controller to be
operated by the CPU.
The CPU initiates a DMA transfer by setting the starting
address and length registers appropriately and
then writing the status register to set its start
transfer bit.
After the DMA operation is complete, the DMA
controller interrupts the CPU to tell it that the
transfer is done.
The CPU’s role during a DMA transfer: As the CPU
cannot use the bus.
As shown in adjacent Figure 4.10, if the CPU has
enough instructions and data in the cache and
registers, it may be able to continue doing useful
work for quite some time oblivious of the DMA
transfer.
But once the CPU needs the bus, it stalls until the DMA
controller returns bus mastership to the CPU.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 16
Direct Memory Access cont’d….
UML sequence diagram of system
activity around a DMA transfer
17. System Bus Configurations
A microprocessor system generally has more than one bus.
As shown in Figure below, high-speed devices may be connected to a high-performance bus, while lower-
speed devices are connected to a different bus.
A small block of logic known as a Bridge allows the buses to connect to each other.
The advantage of using multiple buses and bridges are:
■ Higher-speed buses may provide wider data connections.
■ A high-speed bus usually requires more expensive circuits and connectors. The cost of low-speed
devices can be held down by using a lower-speed, lower-cost bus.
■ The bridge may allow the buses to operate independently, thereby providing some parallelism in
I/O operations.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 17
A multiple bus system
18. Operation of a bus Bridge: The bridge is a slave on the fast bus and the master of the slow bus.
The bridge takes commands from the fast bus (on which it is a slave) and issues those commands on the
slow bus( of which it is a master).
It also returns the results from the slow bus to the fast bus; e.g.: It returns the results of a read on the
slow bus to the fast bus.
The upper sequence of states handles a write from the fast bus to the slow bus.
These states must read the data from the fast bus and set up the handshake for the slow bus.
Operations on the fast and slow sides of the bus bridge should be overlapped as much as possible to
reduce the latency of bus-to-bus transfers.
Similarly, the bottom sequence of states reads from the slow bus and writes the data to the fast bus.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 18
UML state diagram of
bus bridge operation
System Bus Configurations cont’d….
19. AMBA Bus
The AMBA bus supports CPUs, memories, and peripherals integrated in a system-on-silicon.
As shown in Figure below, the AMBA specification includes two buses. The AMBA High-
performance Bus (AHB) is optimized for high-speed transfers and is directly connected to
the CPU which supports several high-performance features: pipelining, burst transfers, split
transactions, and multiple bus masters.
A bridge can be used to connect the AHB to an AMBA Peripherals Bus (APB).
This bus is designed to be simple and easy to implement; it also consumes relatively little
power.
The AHB assumes that all peripherals act as slaves, simplifying the logic required in both the
peripherals and the bus controller. It also does not perform pipelined operations, which
simplifies the bus logic.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 19
Elements of the ARM AMBA bus system
20. Memory Device Organization: A memory is
characterized by its capacity, such as 256
MB.
e.g: A 256-MB memory may be available in
two versions:
■ As a 64M 4-bit array, a single memory
access obtains an 8-bit data item, with
a maximum of 226 different addresses.
■ As a 32M 8-bit array, a single memory
access obtains a 1-bit data item, with
a maximum of 223 different addresses.
The height/width ratio of a memory is known
as its Aspect Ratio.
The best aspect ratio depends on the amount
of memory required.
Internally, the data are stored in a two-
dimensional array of memory cells as
shown in adjacent Figure.
Because the array is stored in two dimensions,
the n-bit address received by the chip is
split into a row and a column address
(with n = r + c).
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 20
Internal organization of a memory device
MEMORY DEVICES
21. MEMORY DEVICES cont’d….
The row and column select a particular memory cell.
If the memory’s external width is 1 bit, the column
address selects a single bit; for wider data widths, the
column address can be used to select a subset of the
columns.
Most memories include an enable signal that controls
the tri-stating of data onto the memory’s pins.
A read/write signal (R/W in the figure) on read/write
memories controls the direction of data transfer;
memory chips do not typically have separate read and
write data pins.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 21
22. Random-Access Memories
Random Access memories can be both read and written. They are
called random access because addresses can be read in any order.
Most bulk memory in modern systems is dynamic RAM (DRAM).
DRAM is very dense; it does, however, require that its values be
refreshed periodically since the values inside the memory cells
decay over time.
The dominant form of dynamic RAM today is the synchronous DRAMs
(SDRAMs), which uses clocks to improve DRAM performance.
SDRAMs use Row Address Select (RAS) and Column Address Select
(CAS) signals to break the address into two parts, which select the
proper row and column in the RAM array.
Signal transitions are relative to the SDRAM clock, which allows the
internal SDRAM operations to be pipelined.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 22
23. October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 23
As shown in adjacent Figure, transitions on
the control signals are related to a clock.
RAS and CAS can therefore become valid at
the same time.
The address lines are not shown in full
detail here; some address lines may not
be active depending on the mode in
use.
SDRAMs use a separate refresh signal to
control refreshing.
DRAM has to be refreshed roughly once per
millisecond and DRAMs refresh part of
the memory at a time instead of
refreshing the entire memory at once.
When a section of memory is being
refreshed, it cannot be accessed until
the refresh is complete.
The memory refresh occurs over fairly few
seconds so that each section is
refreshed every few microseconds.
Random-Access Memories cont’d….
Timing diagram for a read on a synchronous DRAM
24. Read-only memories (ROMs) are preprogrammed with fixed data.
They are very useful in embedded systems since a great deal of the code, and perhaps some
data, does not change over time.
There are several types of ROM available. The factory-programmed ROM (sometimes called
mask-programmed ROM) and field-programmable ROM.
Factory-programmed ROMs are ordered from the factory with particular programming.
ROMs can typically be ordered in lots of a few thousand, but clearly factory programming is
useful only when the ROMs are to be installed in some quantity.
Field-programmable ROMs, on the other hand, can be programmed in the lab.
Flash memory is the dominant form of field-programmable ROM and is electrically erasable.
Flash memory uses standard system voltage for erasing and programming, allowing it to be
reprogrammed inside a typical system.
Early flash memories had to be erased in their entirety; modern devices allow memory to be
erased in blocks.
Most flash memories today allow certain blocks to be protected, where the boot-up code is
kept and other memory blocks on the device can be updated. Such form of flash is
commonly known as Boot-block flash.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 24
Read Only Memories (ROM)
25. I/O DEVICES
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 25
Timers and Counters: Timers and counters are
distinguished largely on the basis of their usage, not
their logic.
Both are built from adder logic with registers to hold the
current value, with an increment input that adds one to
the current register value.
However, a Timer has its count connected to a periodic
clock signal to measure time intervals, while a Counter
has its count input connected to an aperiodic signal in
order to count the number of occurrences of some
external event.
Because the same logic can be used for either purpose, the
device is often called a Counter/Timer.
26. The adjacent Figure shows enough of the internals
of a Counter/Timer to illustrate its operation.
An n-bit counter/timer uses an n-bit register to
store the current state of the count and an
array of half subtractors to decrement the
count when the count signal is asserted.
Combinational logic checks when the count equals
zero; the done output signals the zero count.
It is often useful to be able to control the time-out,
rather than require exactly 2n events to occur.
For this purpose, a reset register provides the value
with which the count register is to be loaded.
The Counter/Timer provides logic to load the reset
register.
Most counters provide both cyclic and acyclic
modes of operation.
In the cyclic mode, once the counter reaches the
done state, it is automatically reloaded and the
counting process continues.
In acyclic mode, the counter/timer waits for an
explicit signal from the microprocessor to
resume counting.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 26
Internals of a Counter/Timer
I/O DEVICES cont’d….
27. A Watchdog Timer is an I/O device that is
used for internal operation of a system.
As shown in Figure, the Watchdog Timer is
connected into the CPU bus and also to
the CPU’s reset line.
The CPU’s software is designed to
periodically reset the watchdog timer,
before the timer ever reaches its time-
out limit.
If the watchdog timer ever does reach that
limit, its time-out action is to reset the
processor.
In that case, the presumption is that either
a Software Flaw or Hardware Problem
has caused the CPU to misbehave.
Rather than diagnosing the problem, the
system is reset to get it operational as
quickly as possible.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 27
I/O DEVICES cont’d….
A Watchdog Timer
28. A/D and D/A Converters
ANALOG/DIGITAL (A/D) and Digital/Analog (D/A) converters (typically
known as ADCs and DACs, respectively) are often used to interface
non digital devices to embedded systems.
Because A/D conversion requires more complex circuitry, it requires a
somewhat more complex interface.
Analog/digital conversion requires sampling the analog input before
converting it to digital form.
A control signal causes the A/D converter to take a sample and digitize
it.
A typical A/D interface has, in addition to its analog inputs, two major
digital inputs.
A Data Port allows A/D registers to be read and written, and a Clock
Input tells when to start the next conversion.
D/A conversion is relatively simple, so the D/A converter interface
generally includes only the data value.
The input value is continuously converted to analog form.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 28
29. Keyboards
A keyboard is basically an array of switches, but it may include some internal logic to help
simplify the interface to the microprocessor.
A switch uses a mechanical contact to make or break an electrical circuit.
The major problem with mechanical switches is that they bounce as shown in Figure below.
When the switch is depressed by pressing on the button attached to the switch’s arm, the
force of the depression causes the contacts to bounce several times until they settle down.
If this is not corrected, it will appear that the switch has been pressed several times, giving
false inputs.
A hardware debouncing circuit can be built using a one-shot timer. Software can also be used
to debounce switch inputs. A raw keyboard can be assembled from several switches.
Each switch in a raw keyboard has its own pair of terminals, making raw keyboards impractical
when a large number of keys is required.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 29
Switch Bouncing
30. More expensive keyboards, such as those used in PCs,
actually contain a microprocessor to preprocess
button inputs.
PC keyboards typically use a 4-bit microprocessor to
provide the interface between the keys and the
computer. The microprocessor can provide
debouncing, but it also provides other functions as
well.
An encoded keyboard uses some code to represent
which switch is currently being depressed. At the
heart of the encoded keyboard is the scanned
array of switches shown in adjacent Figure.
Unlike a raw keyboard, the scanned keyboard array
reads only one row of switches at a time.
The demultiplexer at the left side of the array selects
the row to be read. When the scan input is 1, that
value is transmitted to one terminal of each key in
the row.
If the switch is depressed, the 1 is sensed at that
switch’s column. Since only one switch in the
column is activated, that value uniquely identifies a
key.
The row address and column output can be used for
encoding, or circuitry can be used to give a
different encoding.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 30
A Scanned Key Array
Keyboards cont’d….
31. There are 2 problems associated with encoding the keyboard listed as
follows:
1. Combinations of keys may not be represented.
e.g.: On a PC keyboard, the encoding must be chosen so that
combinations such as control-Q can be recognized and sent to the PC.
2. Rollover may not be allowed.
e.g.: if “a” is pressed and then “b” is pressed before releasing “a,” in
most applications there is need to send an “a” followed by a “b” through
the keyboard.
Rollover is very common in typing at even modest rates.
A naive implementation of the encoder circuitry will simply throw away
any character depressed after the first one until all the keys are released.
The keyboard microcontroller can be programmed to provide n-key rollover,
so that rollover keys are sensed, put on a stack, and transmitted in
sequence as keys are released.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 31
Keyboards cont’d….
32. Light Emitting Diodes (LEDs)
LED’s are often used as simple displays by themselves, and arrays of
LEDs may form the basis of more complex displays.
Figure below shows how to connect an LED to a digital output.
A resistor is connected between the output pin and the LED to absorb
the voltage difference between the digital output voltage and the
0.7 V drop across the LED.
When the digital output goes to 0, the LED voltage is in the device’s off
region and the LED is not on.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 32
An LED connected to a digital output
33. Displays
A display device may be either directly driven
or driven from a frame buffer.
The displays with a small number of elements
are driven directly by logic, while large
displays use a RAM frame buffer.
The n-digit array, shown in adjacent Figure, is
a simple example of a display that is
usually directly driven.
A single-digit display typically consists of
seven segments; each segment may be
either an LED or a Liquid Crystal Display
(LCD) element.
This display relies on the digits being visible
for some time after the drive to the digit is
removed, which is true for both LEDs and
LCDs.
The digit input is used to choose which digit is
currently being updated, and the selected
digit activates its display elements based
on the current data value.
The display’s driver is responsible for
repeatedly scanning through the digits
and presenting the current value of each
to the display.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 33
An n-digit Display
34. A Frame Buffer is a RAM that is attached to the system bus.
The microprocessor writes values into the frame buffer in whatever
order is desired.
The pixels in the frame buffer are generally written to the display in
raster order by reading pixels sequentially.
Many large displays are built using LCD. Each pixel in the display is
formed by a single liquid crystal.
LCD displays present a very different interface to the system
because the array of pixel LCDs can be randomly accessed.
Modern LCD panels use an active matrix system that puts a
transistor at each pixel to control access to the LCD.
Early LCD panels were called passive matrix because they relied on a
two-dimensional grid of wires to address the pixels.
Active matrix displays provide higher contrast and a higher-quality
display.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 34
Displays cont’d….
35. Touchscreens
A Touchscreen is an input device overlaid on an output device. The Touchscreen registers the position of a
touch to its surface. By overlaying this on a display, the user can react to information shown on the
display.
The 2 most common types of touchscreens are Resistive and Capacitive.
Resistive Touchscreen: It uses a 2D voltmeter to sense position. As shown in Figure below, the touchscreen
consists of two conductive sheets separated by spacer balls.
The top conductive sheet is flexible so that it can be pressed to touch the bottom sheet. A voltage is
applied across the sheet; its resistance causes a voltage gradient to appear across the sheet.
The top sheet samples the conductive sheet’s applied voltage at the contact point.
An Analog/Digital Converter is used to measure the voltage and resulting position.
The touchscreen alternates between x and y position sensing by alternately applying horizontal and
vertical voltage gradients.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 35
Cross section of a Resistive Touchscreen
36. COMPONENT INTERFACING
Memory Interfacing: The memory structure will be simple, if a memory
is bought which is of the exact size that is needed.
If more memory is needed than that can be bought in a single chip,
then several such memory chips are needed to construct the
memory of required size.
e.g. if 4GB Memory is needed and the single memory chip is available
in 2GB then 2 Memory chips are needed.
To build a memory that is wider than the one that can bought on a
single chip.
e.g. A 32-bit-wide memory chip cannot be bought generally, a memory
of a given width can easily be constructed (32 bits, 64 bits, etc.) by
placing RAMs in parallel.
Also LOGIC may be needed to turn the Bus Signals into the appropriate
memory signals. So appropriate refresh signals need to be
generated.
e.g. Most busses won’t send address signals in row and column form.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 36
37. Device Interfacing: Some I/O devices are designed
to interface directly to a particular bus, forming
glue-less interfaces.
But glue logic is required when a device is
connected to a bus for which it is not designed.
An I/O device typically requires a much smaller
range of addresses than a memory, so
addresses must be decoded much more
accurately.
Some additional logic is required to cause the bus
to read and write the device’s registers.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 37
COMPONENT INTERFACING cont’d….
38. The device has four registers that can be read and
written by presenting the register number on
the regid pins, asserting R/W as required, and
reading or writing the value on the regval pins.
To interface to the bus, the bottom two bits of the
address are used to refer to registers within the
device, and the remaining bits are used to
identify the device itself.
The top bits of the address are sent to a comparator
for testing against the device address.
The device’s address can be set with switches to
allow the address to be easily changed.
When the bus address matches the device’s, the
result is used to enable a transceiver for the
data pins.
When the transceiver is disabled, the regval pins
are disconnected from the data bus.
The comparator’s output is also used to modify the
R/W signal: The device’s R/W pin is given the
value (bus R/W + not-equal address), so that
when the comparator’s result is not 1, the
device’s R/W pin always receives a 1 to avoid
inadvertently writing the device registers.
A glue logic interface: Below
is an interfacing scheme
for a simple I/O device
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 38
COMPONENT INTERFACING cont’d….
39. System Architecture: An Architecture is a set of elements and the
relationships between them that together form a single unit.
The architecture of an embedded computing system is the blueprint for
implementing that system it gives an information about the components
needed and how they are put together. It includes both hardware and
software elements.
It includes several elements, some of which may be less obvious than others.
■ CPU An embedded computing system clearly contains a
microprocessor.
There are many different architectures, and even within an
architecture there are models that vary in clock speed, bus data
width, integrated peripherals, and so on.
The choice of the CPU is one of the most important, also the
software that will execute on the machine.
■ Bus The choice of a bus is closely tied to that of a CPU, since the bus is
an integral part of the microprocessor.
But in applications that make intensive use of the bus due to I/O or
other data traffic, the bus may be more of a limiting factor than the
CPU.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 39
DESIGNING WITH MICROPROCESSORS
40. System Architecture cont’d….
■ Memory The most obvious characteristic of the memory is its total size,
which depends on both the required data volume and the size of the
program instructions.
The ratio of ROM to RAM and selection of DRAM versus SRAM can have a
significant influence on the cost of the system.
The speed of the memory plays a great role in determining system
performance.
■ Input and Output devices: For a given function, there may be several
different devices of varying sophistication and cost that can do the job for
the CPU.
These devices are called the I/O devices based on fact whether such
device is being used for input or output operation.
e.g. A set of switches and knobs on a front panel may all be controlled by a
single microcontroller, which is in turn connected to the main CPU.
The difficulty of using a particular device, such as the amount of glue logic
required to interface it, may also play a role in final device selection.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 40
41. Hardware Design
Step – 1: Consider evaluation boards supplied by the microprocessor manufacturer or another company working in
collaboration with the manufacturer.
Evaluation boards are sold for many microprocessor systems; they typically include the CPU, some memory, a serial link
for downloading programs, and some minimal number of I/O devices.
Figure below shows an ARM evaluation board manufactured by Sharp. The evaluation board may be a Complete Solution
or provide what is needed with only slight modifications. If the evaluation board is supplied by the microprocessor
vendor, its design may be available from the vendor;
If the evaluation board comes from a third party, it may be possible to contract them to design a new board with the
required modifications, or start from scratch on a new board design.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 41
42. Step-II: The other major task is the choice of memory and
peripheral components.
In the case of I/O devices, there are two alternatives for each
device: selecting a component from a catalog or designing
from scratch.
When shopping for devices from a catalog, it is important to
read data sheets carefully; it may not be trivial to figure out
whether the device does what it is intended for.
Also due consideration must be given to the amount of glue
logic required to connect the device to the bus.
Simple peripheral logic can be implemented in
Programmable Logic Devices (PLDs), while more complex
units can be built from Field-programmable Gate Arrays
(FPGAs).
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 42
Hardware Design cont’d….
43. The PC as a Platform
Personal computers are often used as platforms
for embedded computing.
Advantages of a PC: it is a predesigned hardware
platform with a great many features, a wide
variety of I/O devices can be attached to it,
and it provides a rich programming
environment.
Disadvantage: PC is larger, more power hungry,
and more expensive than a custom hardware
platform would be.
However, for low-volume applications and
environments such as factories and offices
where size and power are not critical, using a
PC to build an embedded system often
makes a lot of sense.
As shown in adjacent Figure, a typical PC includes
several major hardware components:
■ The CPU provides basic computational
facilities.
■ RAM is used for program storage.
■ ROM holds the boot program.
■ A DMA controller provides DMA
capabilities.
■ Timers are used by the operating system for
a variety of purposes.
■ A High-speed Bus, connected to the CPU
bus through a bridge, allows fast devices
to communicate efficiently with the rest
of the system.
■ A Low-speed Bus provides an inexpensive
way to connect simpler devices and may
be necessary for backward compatibility
as well.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 43
Hardware architecture of a typical PC
44. PCI (Peripheral Component Interconnect)
PCI is the High-performance system bus which uses High-speed data
transmission techniques and efficient protocols to achieve high throughput.
The original PCI standard allowed operation up to 33 MHz; at that rate, a
maximum transfer rate of 264 MB/s can be achieved using 64-bit transfers.
The revised PCI standard allows the bus to run up to 66 MHz, giving a maximum
transfer rate of 524 MB/s with 64-bit wide transfers.
PCI uses wide buses with many data and address bits along with multiple
control bits. The width of the PCI bus increases both the cost of an interface
to the bus and makes the physical connection to the bus more complicated.
PCI also allows devices to be chained together so that users need not worry
about the order of devices on the bus or other details of connection.
USB (Universal Serial Bus) and IEEE 1394 are the two major high-speed serial
buses. Both of these buses offer high transfer rates using simple connectors.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 44
45. Basic Input / Output System (BIOS)
A PC provides a standard software platform that interfaces to the
underlying hardware as well as more advanced services.
At the bottom of the software platform structure in most PCs is a
minimal set of software in ROM.
This software is designed to load the complete operating system
from some other device (disk, network, etc.), and it may also
provide low-level hardware interfaces.
In the IBM-compatible PC, the low-level software is known as the
Basic Input / Output System (BIOS).
The BIOS provides low-level hardware drivers as well as booting
facilities.
The operating system provides high-level drivers, control of
executing processes, user interfaces, and so on.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 45
46. System organization of the Intel StrongARM SA-1100 and SA-1111
The StrongARM SA-1100 provides a number of functions besides the ARM CPU.
The chip contains two on-chip buses: a high-speed system bus and a lower-speed peripheral bus.
The chip also uses two different clocks. A 3.686 MHz clock is used to drive the CPU and high-speed
peripherals, and a 32.768 kHz clock is an input to the system control module.
The system control module contains the following peripheral devices:
■ A real-time clock
■ An operating system timer
■ 28 general-purpose I/Os (GPIOs)
■ An interrupt controller
■ A power manager controller
■ A reset controller that handles resetting the processor.
The 32.768 kHz clock’s frequency is chosen to
be useful in timing real-time events.
The slower clock is also used by the power
manager to provide continued operation of
the manager at a lower clock rate and
therefore lower power consumption.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 46
47. DEVELOPMENT AND DEBUGGING
Development Environments: A typical embedded computing system has a relatively small amount of
everything, including CPU horsepower, memory, I/O devices, and so forth.
As a result, it is common to do at least part of the software development on a PC or workstation known as
a host as illustrated in Figure below.
The hardware on which the code will finally run is known as the Target.
The host and target are frequently connected by a USB link, but a higher-speed link such as Ethernet can
also be used.
The target must include a small amount of software to talk to the host system.
That software will take up some memory, interrupt vectors, and so on, but it should generally leave the
smallest possible footprint in the target to avoid interfering with the application software.
The host should be able to do the following:
■ load programs into the target,
■ start and stop program execution on the target, and
■ examine memory and CPU registers.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 47
Connecting a host and a target system
48. A Cross-compiler is a compiler that runs on one type of machine but
generates code for another.
After compilation, the executable code is downloaded to the embedded
system by a serial link or perhaps burned in a PROM and plugged in.
Host-target debuggers are often used, in which the basic hooks for debugging
are provided by the target and a more sophisticated user interface is
created by the host.
A PC or workstation offers a programming environment which is much
friendlier than the typical embedded computing platform.
Problem with this approach emerges when debugging code talks to I/O
devices, as the host will not have the same devices configured in the same
way, the embedded code cannot be run as is done on the host.
A Test-bench program can be built to help debug the embedded code.
The Test-bench generates inputs to simulate the actions of the input devices;
it may also take the output values and compare them against expected
values, providing valuable early debugging help.
The embedded code may need to be slightly modified to work with the
Testbench, but careful coding (such as using the #ifdef directive in C) can
ensure that the changes can be undone easily and without introducing
bugs.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 48
DEVELOPMENT AND DEBUGGING cont’d….
49. Debugging Techniques (S/W based)
A Software Debugging can be done by Compiling and Executing
the code on a PC or workstation.
But at some point it inevitably becomes necessary to run code on
the embedded hardware platform.
Embedded systems are usually less friendly programming
environments than PCs but, the resourceful designer has
several options available for debugging the system.
The serial port found on most evaluation boards is one of the
most important debugging tools.
It is a good idea to design a serial port into an embedded system
even if it is not likely to be used in the final product; the serial
port can be used not only for development debugging but also
for diagnosing problems in the field.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 49
50. Another very important debugging tool is the
Breakpoint.
The simplest form of a Breakpoint is for the user to
specify an address at which the program’s execution
is to break.
When the PC reaches that address, control is returned
to the monitor program.
From the monitor program, the user can examine
and/or modify CPU registers, after which execution
can be continued.
Implementing breakpoints does not require using
exceptions or external devices.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 50
Debugging Techniques (S/W based) cont’d….
51. Following Programming Example shows how to use instructions to create
breakpoints.
Breakpoints: A breakpoint is a location in memory at which a program stops
executing and returns to the debugging tool or monitor program.
Implementing breakpoints is very simple, it only requires replacement of the
instruction at the breakpoint location with a subroutine call to the monitor.
In the following code, to establish a breakpoint at location 0x40c in some ARM code,
the branch (B) instruction is replaced and is normally held at that location with a
subroutine call (BL) to the breakpoint handling routine:
When the breakpoint handler is called, it saves all the registers and can then display
the CPU state to the user and take commands.
To continue execution, the original instruction must be replaced in the program.
If the breakpoint can be erased, the original instruction can simply be replaced and
control returned to that instruction.
This will normally require fixing the subroutine return address, which will point to the
instruction after the breakpoint.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 51
Debugging Techniques (S/W based) cont’d….
52. When Software Tools are insufficient to debug the system, Hardware
aids can be deployed to give a clearer view of what is happening
when the system is running.
The microprocessor In Circuit Emulator (ICE) is a specialized hardware
tool that can help debug software in a working embedded system.
An ICE is a special version of the microprocessor that allows its internal
registers to be read out when it is stopped.
The In-circuit Emulator surrounds this specialized microprocessor with
additional logic that allows the user to specify breakpoints and
examine and modify the CPU state.
The CPU provides as much debugging functionality as a debugger
within a monitor program, but does not take up any memory.
Drawback of In-circuit Emulation: The machine is specific to a
particular microprocessor, even down to the pinout.
If several microprocessors are used, maintaining a fleet of In-circuit
Emulators to match can be very expensive.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 52
Debugging Techniques (H/W based) cont’d….
53. The Logic Analyzer is the other major piece of instrumentation in the
embedded system designer’s arsenal.
Think of a logic analyzer as an array of inexpensive oscilloscopes; the
analyzer can sample many different signals simultaneously (tens to
hundreds) but can display only 0, 1, or changing values for each.
All these logic analysis channels can be connected to the system to
record the activity on many signals simultaneously.
The logic analyzer records the values on the signals into an internal
memory and then displays the results on a display once the memory
is full or the run is aborted.
The logic analyzer can capture thousands or even millions of samples
of data on all of these channels, providing a much larger time
window into the operation of the machine than is possible with a
conventional oscilloscope.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 53
Debugging Techniques (H/W based) cont’d….
54. A typical Logic Analyzer can acquire data in either of two modes that are
typically called State and Timing modes.
The measurement resolution on each signal is reduced in both voltage and
time dimensions.
The reduced voltage resolution is accomplished by measuring logic values (0,
1, x) rather than analog voltages.
The reduction in Timing resolution is accomplished by sampling the signal,
rather than capturing a continuous waveform as in an analog oscilloscope.
State and timing mode represent different ways of sampling the values.
Timing mode uses an Internal Clock that is fast enough to take several
samples per clock period in a typical system.
State mode, uses the System’s own Clock to control sampling, so it samples
each signal only once per clock cycle.
As a result, timing mode requires more memory to store a given number of
system clock cycles.
On the other hand, it provides greater resolution in the signal for detecting
glitches.
Timing mode is typically used for glitch-oriented debugging, while state mode
is used for sequentially oriented problems.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 54
Debugging Techniques (H/W based) cont’d….
55. The Internal Architecture of a logic analyzer is shown in Figure below.
The system’s data signals are sampled at a latch within the logic analyzer; the latch is controlled by either the
system clock or the internal logic analyzer sampling clock, depending on whether the analyzer is being used
in state or timing mode.
Each sample is copied into a vector memory under the control of a state machine.
The latch, timing circuitry, sample memory, and controller must be designed to run at high speed since several
samples per system clock cycle may be required in timing mode.
After the sampling is complete, an embedded microprocessor takes over to control the display of the data
captured in the sample memory.
Logic analyzers typically provide a number of formats for viewing data. One format is a timing diagram format.
Many logic analyzers allow not only customized displays, such as giving names to signals, but also more advanced
display options.
For example, an inverse assembler can be used to turn vector values into microprocessor instructions.
The logic analyzer does not provide access to
the internal state of the components, but it
does give a very good view of the externally
visible signals.
That information can be used for both
Functional and timing debugging.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 55
Architecture of a Logic Analyzer
Debugging Techniques (H/W based) cont’d….
56. Debugging Challenges
Logical errors in software can be hard to track down, but errors in real-time code can
create problems that are even harder to diagnose.
Real-time programs are required to finish their work within a certain amount of time;
if they run too long, they can create very unexpected behavior.
Example below demonstrates one of the problems that can arise.
A timing error in real-time code: To make it easier to compare input to output and
see the results of the bug, assuming that the computation produces an output
equal to the input, but that a bug causes the computation to run 50% longer than
its given time interval.
A sample input to the program over several sample periods follows:
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 56
57. If the program ran fast enough to meet its deadline, the output would simply be a time
shifted copy of the input.
But when the program runs over its allotted time, the output will become very different.
The behavior of the A/D and D/A converters is unpredictable make some assumptions like
.
First, the A/D converter holds its current sample in a register until the next sample period,
and the D/A converter changes its output whenever it receives a new sample.
Next, a reasonable assumption about interrupt systems is that, when an interrupt is not
satisfied and the device interrupts again, the device’s old value will disappear and be
replaced by the new value.
The basic situation that develops when the interrupt routine runs too long is something
like this:
1. The A/D converter is prompted by the timer to generate a new value, saves it in
the register, and requests an interrupt.
2. The interrupt handler runs too long from the last sample.
3. The A/D converter gets another sample at the next period.
4. The interrupt handler finishes its first request and then immediately responds to
the second interrupt. It never sees the first sample and only gets the second one.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 57
Debugging Challenges cont’d….
58. Thus, assuming that the Interrupt Handler takes 1.5 times longer than it should, here is
how it would process the sample input:
• Input sample
Output sample
The output waveform is seriously distorted because the interrupt routine grabs the wrong
samples and puts the results out at the wrong times.
The exact results of missing real-time deadlines depend on the detailed characteristics of
the I/O devices and the nature of the timing violation.
This makes debugging real-time problems especially difficult and if a system exhibits truly
unusual behavior, missed deadlines should be suspected.
In-circuit emulators, logic analyzers, and even LEDs can be useful tools in checking the
execution time of real-time code to determine whether it in fact meets its deadline.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 58
Debugging Challenges cont’d….
59. SYSTEM-LEVEL PERFORMANCE ANALYSIS
SYSTEM-LEVEL PERFORMANCE involves much more than the CPU.
Though focus is on often the CPU because it processes instructions, but any part of the system
can affect total system performance.
More precisely, the CPU provides an upper bound on performance, but any other part of the
system can slow down the CPU. Merely counting instruction execution times is not
enough.
Consider the simple system of Figure below. Data needs to be moved from memory to the CPU
to process it.
To get the data from memory to the CPU following must be done:
■ read from the memory;
■ transfer over the bus to the cache; and
■ transfer from the cache to the CPU.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 59
System level Data Flows and Performance
60. The time required to transfer from the cache to the CPU is included in the
instruction execution time, but the other two times are not.
The most basic measure of performance is Bandwidth— the rate at which the
data can be moved.
The point of interest is real-time performance measured in seconds.
But often the simplest way to measure performance is in units of clock cycles.
However, different parts of the system will run at different clock rates.
So, it has to be ensured that the right clock rate is applied to each part of the
performance estimate while converting clock cycles to seconds.
For simplicity, consider the bandwidth provided by only one system
component, the bus.
Consider an image of 320240 pixels, with each pixel composed of 3 bytes of
data. This gives a grand total of 230, 400 bytes of data.
If these images are video frames, then it is to be checked if one frame can be
pushed through the system within the 1/30s that a frame has to be
processed before the next one arrives.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 60
SYSTEM-LEVEL PERFORMANCE ANALYSIS cont’d….
61. Let the bus clock period be P and the bus width be W.
Putting W in units of bytes (other measures of width could be used as well).
Then to write formulas for the time required to transfer N bytes of data.
We will write our basic formulas in units of bus cycles T , then convert those bus cycle
counts to real time t using the bus clock period P:
t = TP. (4.1)
As shown in Figure below, a basic bus transfer transfers a W-wide set of bytes.
The data transfer itself takes D clock cycles. (Ideally, D = 1, but a memory that
introduces wait states is one example of a transfer that could require D > 1
cycles.)
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 61
SYSTEM-LEVEL PERFORMANCE ANALYSIS cont’d….
Times and data volumes in a basic bus transfer
62. Addresses, handshaking, and other activities constitute overhead that may occur
before (O1) or after (O2) the data.
For simplicity, let the overhead be summed into O = O1 + O2.
This gives a total transfer time in clock cycles of:
Tbasic(N) = (D + O) . N/W ………………………………. (4.2)
As shown in Figure below, a burst transaction performs B transfers of W bytes each.
Each of those transfers will require D clock cycles. The bus also introduces O cycles of
overhead per burst. This gives
Tburst(N) = (B.D + O). N / (BW) ……………………………... (4.3)
Transferring data into and out of components also raises questions of bandwidth. The
simplest illustration of this problem is memory.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 62
SYSTEM-LEVEL PERFORMANCE ANALYSIS cont’d….
Times and data volumes in a burst bus transfer
63. A single memory chip is not solely specified by the number of bits it can hold.
As shown in Figure below, memories of the same size can have different Aspect Ratios.
e.g: A 64-MB memory that is 1-bit-wide will present 64 million addresses of 1-bit data. The same size
memory in a 4-bit-wide format will have 16 distinct addresses and an 8-bit-wide memory will have 8
million distinct addresses.
Memory chips do not come in extremely wide aspect ratios but wider memories can be built by using
several chips.
The memory system width may also be determined by the memory modules used. Rather than buy memory
chips individually, memory as SIMMs or DIMMs may be bought.
Which aspect ratio is preferable for the overall memory system depends also on the format of the data
needs to be stored in the memory and the speed with which it must be accessed, giving rise to
bandwidth analysis.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 63
SYSTEM-LEVEL PERFORMANCE ANALYSIS cont’d….
64. if the data types do not fit naturally into the width of the memory.
Let color video pixels need to be stored in the memory.
A standard pixel is 38-bit color values (say red, green, blue).
A 24-bit-wide memory would allow to read or write an entire pixel value in
one access.
An 8-bit-wide memory, in contrast, would require three accesses for the pixel.
If a 32-bit-wide memory is there then there are 2 main choices:
1. One byte of each transfer could be wasted or
2. Use that byte to store unrelated data, or the pixels can be packed.
In the 2nd case, the first read would get all of the first pixel and one byte of
the second pixel; the second transfer would get the last two bytes of the
second pixel and the first two bytes of the third pixel; and so forth.
The total number of accesses A required to read E data elements of w bits
each out of a memory of width W is:
A = [(E/w) mod W] + 1 …………………………………. (4.4)
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 64
SYSTEM-LEVEL PERFORMANCE ANALYSIS cont’d….
65. Performance bottlenecks in a bus-based system
Consider a simple bus-based system: data has to be transferred
between the CPU and the memory over the bus.
We need to be able to read a 320 X 240 video frame into the CPU at
the rate of 30 frames/s, for a total of 612,000 bytes/s.
Which will be the bottleneck and limit system performance: the bus or
the memory?
Let’s assume that the bus has a 1-MHz clock rate (period of 10-6 sec)
and is 2 bytes wide, with D = 1 and O = 3.
This gives a total transfer time of
Tbasic = (1 + 3).612,000/2 = 1,224,000 cycles ……………….(4.5)
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 65
66. t = Tbasic · P = 1,224,000 · 1 x 10-6 = 1.224 sec ………………………………
(4.6)
Since the total time to transfer one second’s worth of frames is more
than 1s, the bus is not fast enough for our application.
The memory provides a burst mode with B = 4 but is only 4 bits wide,
giving W = 0.5.
For this memory, D = 1 and O = 4. The clock period for this memory is
107 s. Then
Tmem = (4 · 1 + 4).612,000/(4 x 0.5) = 2,448,000 cycles ……… (4.7)
t = Tmem · P = 2,448,000 · 1 x 10-7 = 0.2448 sec ………………..(4.8)
The memory requires < 1s to transfer the 30 frames that must be
transmitted in 1s, so it is fast enough.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 66
Performance bottlenecks in a bus-based system
67. Parallelism
When different components of the
system operate in parallel, more
work can be done in a given
amount of time.
Direct Memory Access is a prime
example of parallelism, DMA was
designed to off-load memory
transfers from the CPU.
The CPU can do other useful work
while the DMA transfer is running.
Figure below shows the paths of
data transfers without and with
DMA when transferring from
memory to a device.
Without DMA, the data must go
through the CPU; the CPU cannot
do useful work at that time.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 67
DMA transfers and parallelism
68. The CPU is tied up for the amount of time required for the
bus transfer.
Since buses often operate at slower clock rates than the
CPU, that time can be considerable.
The system performance can be increased significantly by
overlapping operations on the different units of the
system.
The timing diagrams of adjacent Figure shows timing
diagram for two versions of a computation.
The top timing diagram shows activity in the system when
the CPU first performs some setup operations, then
waits for the bus transfer to complete, then resumes
its work.
In the bottom timing diagram, the program on the CPU has
been rewritten so that its main work is broken into
two sections.
In this case, once the first transfer is done, the CPU can
start working on that data.
Meanwhile, due to DMA, the second transfer happens on
the bus at the same time.
Once that data arrives and the first calculation is finished,
the CPU can go on to the second part of the
computation.
The result is that the entire computation finishes
considerably earlier than in the sequential case.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 68
Sequential and parallel schedules in a bus-based system
Parallelism cont’d….
69. Design Example : ALARM CLOCK
Requirements: the adjacent Figure
illustrates the front panel design for
the alarm clock.
The time is shown as four digits in 12-h
format; a light has been used to
distinguish between AM and PM.
Several buttons are used to set the clock
time and alarm time.
When the hour and minute buttons are
pressed, the hour and minute is
advanced, respectively, by one.
When setting the time, the set time
button must be held down while the
hour and minute buttons are hit; the
set alarm button works in a similar
fashion.
With the alarm on and alarm off buttons,
the alarm is turned on and off.
When the alarm is activated, the alarm
ready light is on. A separate speaker
provides the audible alarm.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 69
Front panel of the alarm clock
70. The Requirements Table:
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 70
Design Example : ALARM CLOCK cont’d….
71. The adjacent Figure 1 shows the basic classes for the
alarm clock.
Calling the class that handles the basic clock operation
the Mechanism class (based on a term from
mechanical watches).
Three classes are there representing physical elements:
Lights* for all the digits and lights,
Buttons* for all the buttons, and
Speaker* for the sound output.
The Buttons* class can easily be used directly by
Mechanism.
The physical display must be scanned to generate the
digits output, so the Display class is introduced to
abstract the physical lights.
The details of the low-level user interface classes are
shown in Figure 2.
The Buzzer* class allows the buzzer to be turned off;
analog electronics will be used to generate the buzz
tone for the speaker.
The Buttons* class provides read-only access to the
current state of the buttons.
The Lights* class allows to drive the lights.
For saving the pins on the display, Lights* provides signals
for only one Digit, along with a set of signals to
indicate which digit is currently being addressed.
Class diagram for the alarm clock
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 71
Details of low-level class for the
alarm clock
Design Example : ALARM CLOCK cont’d….
Figure 2
Figure 1
72. Specification
The display is generated by scanning the digits periodically, this function is performed by the Display
class, which makes the display appear as an un-scanned, continuous display to the rest of the
system.
The Mechanism class is described in Figure below.
This class keeps track of the current time, the current alarm time, whether the alarm has been
turned on, and whether it is currently buzzing.
The clock shows the time only to the minute, but it keeps internal time to the second.
The time is kept as discrete digits rather than a single integer to simplify transferring the time to the
display.
The class provides two behaviors, both of which run continuously.
I. Scan-keyboard is responsible for looking at the inputs and updating the alarm and other
functions as requested by the user.
II. Update-time keeps the current time
accurate.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 72
The Mechanism Class
73. Adjacent Figure shows the state
diagram for update-time.
This behavior is straightforward,
but it must do several things.
It is activated once per second and
must update the seconds clock.
If it has counted 60 s, it must then
update the displayed time;
when it does so, it must roll
over between digits and keep
track of AM-to-PM and PM-to-
AM transitions.
It sends the updated time to the
display object.
It also compares the time with the
alarm setting and sets the
alarm buzzing under proper
conditions.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 73
Specification cont’d….
State diagram for
update-time
74. The state diagram for scan-keyboard is shown in
adjacent Figure .
This function is called periodically, frequently enough
so that all the user’s button presses are caught by
the system.
Because the keyboard will be scanned several times
per second and the same button press need not
be registered several times.
e.g.: the minutes count is advanced on every
keyboard scan when the set-time and minutes
buttons were pressed, the time would be
advanced much too fast.
To make the buttons respond more reasonably, the
function computes button activations; it
compares the current state of the button to the
button’s value on the last scan, and it considers
the button activated only when it is on for this
scan but was off for the last scan.
Once computing the activation values for all the
buttons, it looks at the activation combinations
and takes the appropriate actions.
Before exiting, it saves the current button values for
computing activations the next time this behavior
is executed.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 74
State diagram for scan-keyboard
Specification cont’d….
75. The system has both Periodic and Aperiodic components; the current time must
obviously be updated periodically, and the button commands occur
occasionally.
The following 2 major software components can be present in the Architecture:
■ An Interrupt-driven Routine can update the current time.
The current time will be kept in a variable in memory.
A timer can be used to interrupt periodically and update the time.
The display must be sent the new value when the minute value changes.
This routine can also maintain the PM indicator.
■ A Foreground Program can poll the buttons and execute their commands.
Since buttons are changed at a relatively slow rate, it makes no sense to
add the hardware required to connect the buttons to interrupts.
Instead, the foreground program reads the button values and then use
simple conditional tests to implement the commands, including setting
the current time, setting the alarm, and turning off the alarm.
Another routine called by the foreground program will turn the buzzer on
and off based on the alarm time.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 75
System Architecture
76. The Foreground Code will be implemented as a while loop:
while (TRUE) {
read_buttons(button_values);/* read inputs */
process_command(button_values);/* do commands */
check_alarm();/* decide whether to turn on the alarm */
}
The loop first reads the buttons using read_buttons().
In addition to reading the current button values from the input device, this routine must preprocess the
button values so that the user interface code will respond properly.
As shown in Figure below, this can be done by performing a simple edge detection on the button input, the
button event value is 1 for one sample period when the button is depressed and then goes back to 0
and does not return to 1 until the button is depressed and then released.
This can be accomplished by a simple
two-state machine.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 76
System Architecture cont’d….
Preprocessing button inputs
77. The process_command() function is responsible for responding to
button events.
The function checks the current time against the alarm time and
decides when to turn on the buzzer.
This check_alarm() routine is kept separate from the Command
Processing Code since the alarm must go on when the proper time
is reached, independent of the button inputs.
From the software architecture it can be seen that a timer needs to be
connected to the CPU. Also a logic to connect the buttons to the
CPU bus will be needed.
Finally, before starting to write code and build hardware, draw the
State Transition Graph for the clock’s commands.
That diagram will be used to guide the implementation of the software
components.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 77
System Architecture cont’d….
78. Component Design and Testing
The 2 major software components, the Interrupt Handler and the Foreground
Code, can be implemented relatively straightforwardly.
As the functionality of the Interrupt Handler is in the interruption process
itself, that code is best tested on the Microprocessor Platform.
The Foreground Code can be more easily tested on the PC or workstation
used for code development.
A testbench can be created for this code which generates button depressions
to exercise the state machine.
the advancement of the system clock also needs to be simulated.
A better testing strategy for Interrupt Handler is to add testing code that
updates the clock, perhaps once per four iterations of the foreground
while loop.
The Timer taken care this way, the focus can thus be on implementing logic to
interface to the buttons, display, and buzzer.
The buttons will require debouncing logic.
The display will require a register to hold the current display value in order to
drive the display elements.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 78
79. System Integration and Testing
Because this system has a small number of
components, system integration is relatively easy.
The software must be checked to ensure that
debugging code has been turned off.
Three types of Tests can be performed.
1. The clock’s accuracy can be checked against a
reference clock.
2. The commands can be exercised from the
buttons.
3. The buzzer’s functionality should be verified.
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 79
80. THANK YOU
October 6, 2014 ECS Lecture Notes VII Sem CSE (VTU), 3rd Unit: By Dr. K Satyanarayan Reddy 80