3. Reactive systems
Respond to external events.
Engine controller.
Seat belt monitor.
Requires real-time response.
System architecture.
Program implementation.
May require a chain reaction among
multiple processors.
4. Tasks and processes
A task is a functional
description of a
connected set of
operations.
(Task can also mean
a collection of
processes.)
A process is a unique
execution of a program.
Several copies of a
program may run
simultaneously or at
different times.
A process has its own
state:
registers;
memory.
The operating system
manages processes.
5. Why multiple processes?
Multiple tasks means multiple processes.
Processes help with timing complexity:
multiple rates
multimedia
automotive
asynchronous input
user interfaces
communication systems
6. Multi-rate systems
Implementing code that satisfies timing
requirement is even more complex when
multiple rates of computation must be handled.
Tasks may be synchronous or asynchronous.
Synchronous tasks may recur at different rates.
Processes run at different rates based on
computational needs of the tasks.
8. Typical rates in engine
controllers
Variable Full range time (ms) Update period (ms)
Engine spark timing 300 2
Throttle 40 2
Air flow 30 4
Battery voltage 80 4
Fuel flow 250 10
Recycled exhaust gas 500 25
Status switches 100 20
Air temperature Seconds 400
Barometric pressure Seconds 1000
Spark (dwell) 10 1
Fuel adjustment 80 8
Carburetor 500 25
Mode actuators 100 100
9. Real-time systems
Perform a computation to conform to external
timing constraints.
Deadline frequency:
Periodic.
Aperiodic.
Deadline type:
Hard: failure to meet deadline causes system failure.
Soft: failure to meet deadline causes degraded
response.
Firm: late response is useless but some late
responses can be tolerated.
12. Rate requirements on
processes
Period: interval
between process
activations.
Rate: reciprocal of
period.
Initiatino rate may be
higher than period---
several copies of
process run at once. time
P11
P12
P13
P14
CPU 1
CPU 2
CPU 3
CPU 4
13. Timing violations
What happens if a process doesn’t finish
by its deadline?
Hard deadline: system fails if missed.
Soft deadline: user may notice, but system
doesn’t necessarily fail.
14. Example: Space Shuttle
software error
Space Shuttle’s first launch was delayed
by a software timing error:
Primary control system PASS and backup
system BFS.
BFS failed to synchronize with PASS.
Change to one routine added delay that
threw off start time calculation.
1 in 67 chance of timing problem.
15. Task graphs
Tasks may have data
dependencies---must
execute in certain order.
Task graph shows
data/control
dependencies between
processes.
Task: connected set of
processes.
Task set: One or more
tasks.
P3
P1 P2
P4
P5
P6
task 1 task 2
task set
16. Communication between
tasks
Task graph assumes that
all processes in each task
run at the same rate,
tasks do not
communicate.
In reality, some amount
of inter-task
communication is
necessary.
It’s hard to require
immediate response for
multi-rate communication.
MPEG
system
layer
MPEG
audio
MPEG
video
17. Process execution
characteristics
Process execution time Ti.
Execution time in absence of preemption.
Possible time units: seconds, clock cycles.
Worst-case, best-case execution time may be useful
in some cases.
Sources of variation:
Data dependencies.
Memory system.
CPU pipeline.
18. Utilization
CPU utilization:
Fraction of the CPU that is doing useful work.
Often calculated assuming no scheduling
overhead.
Utilization:
U = (CPU time for useful work)/ (total available CPU time)
= [ S t1 ≤ t ≤ t2 T(t) ] / [t2 – t1]
= T/t
19. State of a process
A process can be in
one of three states:
executing on the CPU;
ready to run;
waiting for data.
executing
ready waiting
gets data
and CPU
needs
data
gets data
needs data
preempted
gets
CPU
20. The scheduling problem
Can we meet all deadlines?
Must be able to meet deadlines in all cases.
How much CPU horsepower do we need
to meet our deadlines?
22. Simple processor feasibility
Assume:
No resource conflicts.
Constant process
execution times.
Require:
T ≥ Si Ti
Can’t use more than
100% of the CPU.
T1 T2 T3
T
23. Hyperperiod
Hyperperiod: least common multiple
(LCM) of the task periods.
Must look at the hyperperiod schedule to
find all task interactions.
Hyperperiod can be very long if task
periods are not chosen carefully.
25. Simple processor feasibility
example
P1 period 1 ms, CPU
time 0.1 ms.
P2 period 1 ms, CPU
time 0.2 ms.
P3 period 5 ms, CPU
time 0.3 ms.
LCM 5.00E-03
peirod CPU time CPU time/LCM
P1 1.00E-03 1.00E-04 5.00E-04
P2 1.00E-03 2.00E-04 1.00E-03
P3 5.00E-03 3.00E-04 3.00E-04
total CPU/LCM 1.80E-03
utilization 3.60E-01
27. TDMA assumptions
Schedule based on
least common
multiple (LCM) of
the process
periods.
Trivial scheduler -
> very small
scheduling
overhead.
P1 P1 P1
P2 P2
PLCM
28. TDMA schedulability
Always same CPU utilization (assuming
constant process execution times).
Can’t handle unexpected loads.
Must schedule a time slot for aperiodic
events.
29. TDMA schedulability example
TDMA period = 10
ms.
P1 CPU time 1 ms.
P2 CPU time 3 ms.
P3 CPU time 2 ms.
P4 CPU time 2 ms.
TDMA period 1.00E-02
CPU time
P1 1.00E-03
P2 3.00E-03
P3 2.00E-03
P4 2.00E-03
total 8.00E-03
utilization 8.00E-01
30. Round-robin
Schedule process
only if ready.
Always test
processes in the
same order.
Variations:
Constant system
period.
Start round-robin
again after finishing
a round.
T1 T2 T3
P
T2 T3
P
31. Round-robin assumptions
Schedule based on least common multiple
(LCM) of the process periods.
Best done with equal time slots for
processes.
Simple scheduler -> low scheduling
overhead.
Can be implemented in hardware.
32. Round-robin schedulability
Can bound maximum CPU load.
May leave unused CPU cycles.
Can be adapted to handle unexpected
load.
Use time slots at end of period.
33. Schedulability and
overhead
The scheduling process consumes CPU
time.
Not all CPU time is available for processes.
Scheduling overhead must be taken into
account for exact schedule.
May be ignored if it is a small fraction of total
execution time.
36. Timed loop implementation
Encapuslate set of all
processes in a single
function that
implements the task
set,.
Use timer to control
execution of the task.
No control over timing
of individual
processes.
void pall(){
p1();
p2();
}
37. Multiple timers
implementation
Each task has its own
function.
Each task has its own
timer.
May not have enough
timers to implement
all the rates.
void pA(){ /* rate A */
p1();
p3();
}
void B(){ /* rate B */
p2();
p4();
p5();
}
38. Timer + counter
implementation
Use a software count
to divide the timer.
Only works for clean
multiples of the timer
period.
int p2count = 0;
void pall(){
p1();
if (p2count >= 2) {
p2();
p2count = 0;
}
else p2count++;
p3();
}
39. Implementing processes
All of these implementations are
inadequate.
Need better control over timing.
Need a better mechanism than
subroutines.
41. Operating systems
The operating system controls resources:
who gets the CPU;
when I/O takes place;
how much memory is allocated.
The most important resource is the CPU
itself.
CPU access controlled by the scheduler.
42. Process state
A process can be in
one of three states:
executing on the CPU;
ready to run;
waiting for data.
executing
ready waiting
gets data
and CPU
needs
data
gets data
needs data
preempted
gets
CPU
43. Operating system
structure
OS needs to keep track of:
process priorities;
scheduling state;
process activation record.
Processes may be created:
statically before system starts;
dynamically during execution.
44. Embedded vs. general-
purpose scheduling
Workstations try to avoid starving
processes of CPU access.
Fairness = access to CPU.
Embedded systems must meet deadlines.
Low-priority processes may not run for a
long time.
45. Priority-driven scheduling
Each process has a priority.
CPU goes to highest-priority process that
is ready.
Priorities determine scheduling policy:
fixed priority;
time-varying priorities.
46. Priority-driven scheduling
example
Rules:
each process has a fixed priority (1 highest);
highest-priority ready process gets CPU;
process continues until done.
Processes
P1: priority 1, execution time 10
P2: priority 2, execution time 30
P3: priority 3, execution time 20
48. The scheduling problem
Can we meet all deadlines?
Must be able to meet deadlines in all cases.
How much CPU horsepower do we need
to meet our deadlines?
49. Process initiation
disciplines
Periodic process: executes on (almost)
every period.
Aperiodic process: executes on demand.
Analyzing aperiodic process sets is harder-
--must consider worst-case combinations
of process activations.
50. Timing requirements on
processes
Period: interval between process
activations.
Initiation interval: reciprocal of period.
Initiation time: time at which process
becomes ready.
Deadline: time at which process must
finish.
51. Timing violations
What happens if a process doesn’t finish
by its deadline?
Hard deadline: system fails if missed.
Soft deadline: user may notice, but system
doesn’t necessarily fail.
52. Example: Space Shuttle
software error
Space Shuttle’s first launch was delayed
by a software timing error:
Primary control system PASS and backup
system BFS.
BFS failed to synchronize with PASS.
Change to one routine added delay that
threw off start time calculation.
1 in 67 chance of timing problem.
54. IPC styles
Shared memory:
processes have some memory in common;
must cooperate to avoid destroying/missing
messages.
Message passing:
processes send messages along a
communication channel---no common
address space.
56. Race condition in shared
memory
Problem when two CPUs try to write the
same location:
CPU 1 reads flag and sees 0.
CPU 2 reads flag and sees 0.
CPU 1 sets flag to one and writes location.
CPU 2 sets flag to one and overwrites
location.
57. Atomic test-and-set
Problem can be solved with an atomic
test-and-set:
single bus operation reads memory location,
tests it, writes it.
ARM test-and-set provided by SWP:
ADR r0,SEMAPHORE
LDR r1,#1
GETFLAG SWP r1,r1,[r0]
BNZ GETFLAG
58. Critical regions
Critical region: section of code that cannot
be interrupted by another process.
Examples:
writing shared memory;
accessing I/O device.
59. Semaphores
Semaphore: OS primitive for controlling
access to critical regions.
Protocol:
Get access to semaphore with P().
Perform critical region operations.
Release semaphore with V().
61. Process data
dependencies
One process may not
be able to start until
another finishes.
Data dependencies
defined in a task
graph.
All processes in one
task run at the same
rate.
P1 P2
P3
P4
64. Metrics
How do we evaluate a scheduling policy:
Ability to satisfy all deadlines.
CPU utilization---percentage of time devoted
to useful work.
Scheduling overhead---time required to make
scheduling decision.
65. Rate monotonic scheduling
RMS (Liu and Layland): widely-used,
analyzable scheduling policy.
Analysis is known as Rate Monotonic
Analysis (RMA).
66. RMA model
All process run on single CPU.
Zero context switch time.
No data dependencies between
processes.
Process execution time is constant.
Deadline is at end of period.
Highest-priority ready process runs.
67. Process parameters
Ti is computation time of process i; ti is
period of process i.
period ti
Pi
computation time Ti
68. Rate-monotonic analysis
Response time: time required to finish
process.
Critical instant: scheduling state that gives
worst response time.
Critical instant occurs when all higher-
priority processes are ready to execute.
72. RMS CPU utilization
Utilization for n processes is
S i Ti / ti
As number of tasks approaches infinity,
maximum utilization approaches 69%.
73. RMS CPU utilization,
cont’d.
RMS cannot use 100% of CPU, even with
zero context switch overhead.
Must keep idle cycles available to handle
worst-case scenario.
However, RMS guarantees all processes
will always meet their deadlines.
77. EDF implementation
On each timer interrupt:
compute time to deadline;
choose process closest to deadline.
Generally considered too expensive to use
in practice.
78. Fixing scheduling
problems
What if your set of processes is
unschedulable?
Change deadlines in requirements.
Reduce execution times of processes.
Get a faster CPU.
79. Priority inversion
Priority inversion: low-priority process
keeps high-priority process from running.
Improper use of system resources can
cause scheduling problems:
Low-priority process grabs I/O device.
High-priority device needs I/O device, but
can’t get it until low-priority process is done.
Can cause deadlock.
80. Solving priority inversion
Give priorities to system resources.
Have process inherit the priority of a
resource that it requests.
Low-priority process inherits priority of
device if higher.
81. Data dependencies
Data dependencies
allow us to improve
utilization.
Restrict combination
of processes that can
run simultaneously.
P1 and P2 can’t run
simultaneously.
P1
P2
82. Context-switching time
Non-zero context switch time can push
limits of a tight schedule.
Hard to calculate effects---depends on
order of context switches.
In practice, OS context switch overhead is
small (hundreds of clock cycles) relative to
many common task periods (ms – ms).
86. IPC styles
Shared memory:
processes have some memory in common;
must cooperate to avoid destroying/missing
messages.
Message passing:
processes send messages along a
communication channel---no common
address space.
88. Race condition in shared
memory
Problem when two CPUs try to write the
same location:
CPU 1 reads flag and sees 0.
CPU 2 reads flag and sees 0.
CPU 1 sets flag to one and writes location.
CPU 2 sets flag to one and overwrites
location.
89. Atomic test-and-set
Problem can be solved with an atomic
test-and-set:
single bus operation reads memory location,
tests it, writes it.
ARM test-and-set provided by SWP:
ADR r0,SEMAPHORE
LDR r1,#1
GETFLAG SWP r1,r1,[r0]
BNZ GETFLAG
90. Critical regions
Critical region: section of code that cannot
be interrupted by another process.
Examples:
writing shared memory;
accessing I/O device.
91. Semaphores
Semaphore: OS primitive for controlling
access to critical regions.
Protocol:
Get access to semaphore with P().
Perform critical region operations.
Release semaphore with V().
93. Process data
dependencies
One process may not
be able to start until
another finishes.
Data dependencies
defined in a task
graph.
All processes in one
task run at the same
rate.
P1 P2
P3
P4
94. Signals in UML
More general than Unix signal---may carry
arbitrary data:
<<signal>>
aSig
p : integer
someClass
sigbehavior()
<<send>>
96. Scheduling and context
switch overhead
Process Execution
time
deadline
P1 3 5
P2 3 10
With context switch overhead of
1, no feasible schedule.
2TP1 + TP2 = 2*(1+3)+(1_3)=11
97. Process execution time
Process execution time is not constant.
Extra CPU time can be good.
Extra CPU time can also be bad:
Next process runs earlier, causing new
preemption.
98. Processes and caches
Processes can cause additional caching
problems.
Even if individual processes are well-
behaved, processes may interfere with each
other.
Worst-case execution time with bad
behavior is usually much worse than
execution time with good cache behavior.
99. Effects of scheduling on
the cache
Process WCET Avg. CPU
time
P1 8 6
P2 4 3
P3 4 3
Schedule 1 (LRU cache):
Schedule 2 (half of cache
reserved for P1):
100. Power optimization
Power management: determining how
system resources are scheduled/used to
control power consumption.
OS can manage for power just as it
manages for time.
OS reduces power by shutting down units.
May have partial shutdown modes.
101. Power management and
performance
Power management and performance are
often at odds.
Entering power-down mode consumes
energy,
time.
Leaving power-down mode consumes
energy,
time.
102. Simple power management
policies
Request-driven: power up once request is
received. Adds delay to response.
Predictive shutdown: try to predict how
long you have before next request.
May start up in advance of request in
anticipation of a new request.
If you predict wrong, you will incur additional
delay while starting up.
103. Probabilistic shutdown
Assume service requests are probabilistic.
Optimize expected values:
power consumption;
response time.
Simple probabilistic: shut down after time
Ton, turn back on after waiting for Toff.
104. Advanced Configuration
and Power Interface
ACPI: open standard for power
management services.
Hardware platform
device
drivers
ACPI BIOS
OS kernel
applications
power
management
105. ACPI global power states
G3: mechanical off
G2: soft off
S1: low wake-up latency with no loss of context
S2: low latency with loss of CPU/cache state
S3: low latency with loss of all state except memory
S4: lowest-power state with all devices off
G1: sleeping state
G0: working state
107. Why distributed?
Higher performance at lower cost.
Physically distributed activities---time
constants may not allow transmission to
central site.
Improved debugging---use one CPU in
network to debug others.
May buy subsystems that have embedded
processors.
108. Network abstractions
International Standards Organization
(ISO) developed the Open Systems
Interconnection (OSI) model to describe
networks:
7-layer model.
Provides a standard way to classify
network components and operations.
109. OSI model
physical mechanical, electrical
data link reliable data transport
network end-to-end service
transport connections
presentation data format
session application dialog control
application end-use interface
110. OSI layers
Physical: connectors, bit formats, etc.
Data link: error detection and control
across a single link (single hop).
Network: end-to-end transmission
services, multi-hop data communication.
Transport: provides connections that
ensures that data are delivered in proper
manner.
111. OSI layers, cont’d.
Session: services for end-user
applications: data grouping,
checkpointing, etc.
Presentation: data exchange formats,
transformation services.
Application: interface between network
and end-user programs.
112. I2C bus
Well-known bus commonly used to link
microcontrollers into system
Designed for low-cost, medium data rate
(upto 100kbps to 400 kbps)
Characteristics:
serial;
multiple-master;
fixed-priority arbitration.
Several microcontrollers come with built-
in I2C controllers.
116. I2C signaling
Sender pulls down bus for 0.
Sender listens to bus---if it tried to send a
1 and heard a 0, someone else is
simultaneously transmitting.
Transmissions occur in 8-bit bytes.
117. I2C data link layer
Every device has an address (7 bits in
standard, 10 bits in extension).
Bit 8 of address signals read or write.
General call address allows broadcast.
118. I2C bus arbitration
Sender listens while sending address.
When sender hears a conflict, if its
address is higher, it stops signaling.
Low-priority senders relinquish control
early enough in clock cycle to allow bit to
be transmitted reliably.
126. Ethernet performance
Quality-of-service tends to non-linearly
decrease at high load levels.
Can’t guarantee real-time deadlines.
However, may provide very good service
at proper load levels.
127. Fieldbus
Used for industrial control and
instrumentation---factories, etc.
H1 standard based on 31.25 MB/s twisted
pair medium.
High Speed Ethernet (HSE) standard
based on 100 Mb/s Ethernet.
130. CAN bus was designed for automotive electronics
and was first used in production cars in 1991.
CAN bus is a serial communication bus
CAN bus is a broadcast type of bus
CAN bus uses Non-return To zero with bit stuffing
Operates at data rates of up to 1 Megabits per
second over a twisted pair connection of 40 meters.
It has excellent error detection and confinement
131. The CAN bus are similar to I2C bus
It was originally developed for use of in automation
Is now being used in many other industrial
application
High reliability bus
Its domain of application ranges from high speed
networks to low cost multiplex wiring
132. Prioritization of messages
Guarantee of latency times
Configuration flexibility
Multicast reception with time synchronization
System wide data consistency
Multimaster concept
CSMA/CA (Carrier Sense Multiple Access / Collision
Avoidance) and Message priority
Error detection and signalling
133. Can bus has its own electrical drivers and receivers
that connect the node to the bus in wired .
Can terminology a logical 1 on the bus is called
recessive and logical 0 is dominant
When all nodes are transmitting 1s the bus is said to
in the recessive state
When all nodes are transmits a 0s the bus is in the
dominant state
135. High speed CAN physical layer
Speed up to 1Mbit/s
Max. bus length 40m at 1Mbit/s
Two wire differential bus
Characteristic line impedance 120 Ohm
Low speed CAN physical layer
Speed up to 125 Kbit/s
Up to 32 nodes per network
138. A DATA FRAME is composed of seven different bit fields
SOF: 1 dominant bit (0) to indicate the beginning of a
new message
Arbitration: 11 bit identifier of the source of the message
(priority) + RTR (Remote transmission request)
Control: 6 bits to identify the length of the data
Data: Data to be transmitted (payload)
CRC: Cyclic Redundancy Check (bit errors)
139. To avoid data collisions
CAN performs a bitwise
Wired AND configuration:
0-level: dominant level
1-level: recessive level
Whenever the bus is free (recessive level), any
station can start to transmit data.
The lower the value of the identifier, the higher the
140. The remote frame is identical to the data frame
except
A remote frame is used to request data from
another node
The RTR bit set to recessive
data length contains the number of bytes that
are required from the data frame
141. Error frame consists of two different fields
The first field is ERROR FLAGS contributed from
different stations. The second field is the ERROR
DELIMITER.
There are two types of error flags
Active Error Flag transmitted by a node detecting
an error on the network that is in error state
Passive Error Flag transmitted by a node detecting
an active error frame on the network that is in error
142. TYPES OF ERROR
There are five types of error are there in CAN and are
listed below
1. BIT ERROR
2. STUFF ERROR
3. CRC ERROR
4. FORM ERROR
5. ACKNOWLEDGEMENT ERROR
143. Bit error:
During transmission the node transmits the bit
at transmit time region and receives the bit at
receive time and two bits are compared if they are
not equal then that is considered as bit error.
Stuff error:
A STUFF ERROR has to be detected at the bit time
of the 6th consecutive equal bit level in a message
field that should be coded by the method of bit
stuffing.
144. CRC error:
While receiving data or remote frame the CRC value
will be calculated and is compared with the received
CRC values. In case of proper frame transmission
both will be equal. Else that will be considered as
CRC error and error frame will be transmitted.
Form error:
A FORM ERROR has to be detected when a fixed-
form bit field contains one or more illegal bits.
145. ACKNOWLEDGMENT:
During the transmission of data or remote frame,
in the ACK flag field the transmitter will put
recessive bit and expect dominant bit from receive
pin. If dominant bit is observed then that is
considered as proper transmission. In other case it
is acknowledgement error. The error frame is
transmitted if acknowledgment error is
observed.ERROR
146. In CAN frames a bit of opposite polarity is inserted
after five consecutive bits of the same polarity
This practice is called bit stuffing, and is due to the
"Non Return to Zero" (NRZ) coding adopted
The "stuffed" data frames are destuffed by the
receiver
The bit stuffing is used, six consecutive bits of the
same type (111111 or 000000) are considered an
error
148. Factory automation
Industrial machine control
Lifts and escalators
Building automation
Non-industrial control
Non-industrial equipment
149. Lower cost at vehicle construction
Increased flexibility and reusability of design
Reduces time to market
Facilitates drive by wire which reduces cost
Further
Facilitates advanced features in vehicles
Enhances debug at point of service
151. Concept
Build accelerator for block motion
estimation, one step in video
compression.
Perform two-dimensional correlation:
Frame 1
f2
f2
f2
f2
f2
f2
f2
f2
f2
f2
152. Block motion estimation
MPEG divides frame into 16 x 16
macroblocks for motion estimation.
Search for best match within a search
range.
Measure similarity with sum-of-absolute-
differences (SAD):
S | M(i,j) - S(i-ox, j-oy) |
156. Computational
requirements
Let MBSIZE = 16, SEARCHSIZE = 8.
Search area is 8 + 8 + 1 in each
dimension.
Must perform:
nops = (16 x 16) x (17 x 17) = 73984 ops
CIF format has 352 x 288 pixels -> 22 x
18 macroblocks.
157. Accelerator requirements
name block motion estimator
purpose block motion est. in PC
inputs macroblocks, search areas
outputs motion vectors
functions compute motion vectors with
full search
performance as fast as possible
manufacturing cost hundreds of dollars
power from PC power supply
physical size/weight PCI card
158. Accelerator data types,
basic classes
Motion-vector
x, y : pos
Macroblock
pixels[] : pixelval
Search-area
pixels[] : pixelval
PC
memory[]
Motion-estimator
compute-mv()
160. Architectural
considerations
Requires large amount of memory:
macroblock has 256 pixels;
search area has 1,089 pixels.
May need external memory (especially if
buffering multiple macroblocks/search
areas).
162. Pixel schedules
PE 0 PE 1 PE 2
|M(0,0)-S(0,0)|
|M(0,1)-S(0,1)| |M(0,0)-S(0,1)|
|M(0,2)-S(0,2)| |M(0,1)-S(0,2)| |M(0,0)-S(0,2)|
M(0,0)
S(0,2)
163. System testing
Testing requires a large amount of data.
Use simple patterns with obvious answers
for initial tests.
Extract sample data from JPEG pictures
for more realistic tests.