I have collected all the necessary information about various hardware blocks of Nvidia Tegra K1 processor and put them together. It would be helpful for those who are/going to work on it by giving the details in a very concise fashion.
3. Two Versions - Logan and Denver
Logan - 32 bit quad-core 4-PLUS-1 ARM Cortex
A15 CPU; upto 2.3GHz; 28 nm process
Logan - Two part nos. available CD575M and
CD575MI
Denver - 64 bit dual core based on ARMv8
architecture; upto 2.5GHz
64kB L1;32kB of I-cache and 32kB D-cache
2MB L2 cache
OUR FOCUS → LOGAN
5. Vector graphics is the
use of geometrical
primitives such as
points, lines, curves,
and shapes or
polygons—all of
which are based on
mathematical
expressions—to
represent images in
computer graphics
6. Rasterisation (or
rasterization) is the
task of taking an
image described in a
vector graphics
format (shapes) and
converting it into a
raster image (pixels
or dots) for output on
a video display or
printer, or for storage
in a bitmap file
format.
7. Reason- mobile devices are in standby state for
almost 80% time → power saving
4-PLUS-1 CPU → 4 HIGH performance more
power intensive cores and 1 LOW power, low
performance core
S/W b/w cores done on basis of processing reqd.;
intelligent s/w hysteresis
Total Power = Leakage + Dynamic
Dynamic Power α Frequency x (Voltage)2
8. Fast Process = Optimized for high frequency
operation, but higher leakage
Low Power Process = Operates at lower
frequency with lower leakage
High to low performance crossover at 600MHz
Low power core has peak freq. of 1GHz
Both cores are OS transparent
Not all 4 high performance cores active; dynamic
enable/disable
Note: all the 5 cores cannot be active
simultaneously
9.
10.
11.
12. Motive is to free the CPU
Handle varied workload and use GPU efficiently
Run complex, less structured tasks
Any kernel can launch another kernel and can
create the necessary streams,events and
dependencies needed to process additional work
without the need for host CPU interaction.
13. GPU core can be used by multiple CPUs
Enables multiple CPU cores to launch work on a
single GPU simultaneously
Increases GPU utilization and slashing CPU idle
times
32 simultaneous, hardware managed
connections(?)
17. About the SCU
The SCU connects one to four Cortex-A9
processors to the memory system through the
AXI interfaces.
The SCU functions are to:
maintain data cache coherency between the
Cortex-A9 processors
initiate L2 AXI memory accesses
arbitrate between Cortex-A9 processors
requesting L2 accesses
manage ACP accesses.
Snoop Control Unit(SCU)Snoop Control Unit(SCU)
18. AGENDA
AVP
MPIO
Interrupt Controllers
Clock
Boot
Power States
PMC
Flow Controller
Power Architecture
Memory Controller
Peripherals
19. AVP- Audio Video Processor
Functions
- Manage initial boot stages
- Control and assist hardware audio decoding
blocks, BSEA and VCP2
- Control and assist hardware video decoder,VDE
256 kB local RAM(IRAM)
8kB cache
20. Muti- purpose I/O : MPIOMuti- purpose I/O : MPIO
Each MPIO consists of:
Output driver with:
-Tri state capability
- Drive strength controls
-Push pull mode, open drain mode or both
Input receiver with Schmitt mode, CMOS mode or
both
Weak pull up or pull down
They stay in their POR state until changed by
software(bootloader or OS)
Default pad drive impedance is 50 ohms
21. 5 types of MPIO pads:
ST(Standard)
DD(dual driver)- 3.3V tolarant(pull up resistor)
regardless of i/p V....must be set to open drain
mode...special pwr seq considerations for this
OD(open drain)-5V tolerant..no push pull driver
CZ(controlled Z)-tigntly controlled Z
LV- 1.8V tolerant
MPIO....contd.MPIO....contd.
22. MPIO....contd.MPIO....contd.
Each MPIO can have upto 5 functions- upto 4
SFIO( special funtion wherein they are for
peripherals) and 1 as GPIO
Pinmux controller handles MPIO functionality and
has one register per MPIO
24. GPIO Controller
GPIO controller is divided into 8 banks
Each bank handles upto 32 MPIOs
Within each bank, GPIOs are arranged as 4 ports
of 8 bits each
162 GPIOs in all
Individually config. as Input, output, interrupt
source with edge/level triggering
Lock bit functionality(optional) ensures GPIO
config. is not modified during runtime, system
reset can clear this bit
25. Unused Pin- PWR Saving
Assert tri state and disable input buffer
If all pins in a pad control group are unused, set
the drive strengths and slew rates to a minimum
If all pins on a power rail are unused, assert
E_NO_IOPOWER for that rail in the PMC registers
26. Two- vGIC(Virtual generic Interrupt controller) and
LIC(Legacy Interrupt controller)
vGIC- For the ARM15 CPUs and LIC for the
ARM7 AVP
160 hardware interrupts grouped into slices of 32
where each slice can be configured independently
27. There is one vGIC per CPU cluster and runs at
half the clk freq. of that cluster
vGIC supports 256 interrupts each with a unique
ID
Interrupt sources for vGIC
Software Generated Interrupts(SGI)
Private Peripheral Interrupts(PPI)
Shared Peripheral Interrupts(SPI)
28. SGIs(also called IPIs ie Inter Processor Interrupts)
generated by writing to vGIC registers, max. of 16
in no., ID 0 to 15
PPIs are generated by a peripheral that is specific
to a CPU. 7 PPIs per CPU. nFIQ and nIRQ
provided as pins.(?)
SPIs are external hardware interrupts given via
IRQ pins and also by internal SoC units. Level
triggered
InterruptInterrupt
Controllers.....contd.Controllers.....contd.
29. Two external Clks- 32.768kHz(for PMC and RTC) and
12MHz
16 PLLs
For saving power by clock gating refer page 78 of TRM
Each peripheral has its own CLOCK_SOURCE register- 2
bits to select from 4 clk sources and 8 bits for clk divider, 7
for integer and 1 for fraction
CL-DVFS(Closed Loop Dynamic Voltage and Frequency
Scaling) register help controlling clock and power supply to
FCPU(fast CPU) complex
30. RTC
Maintains sec and ms counters
5 alarm registers
Always ON pwr domain
Can issue interrupts in LP states
Hardware adjusts drifts in clock due to PPM
variations of osc
All registers(except BUSY) use 32KHz clk domain
31. TIMERS
RTC
Nvidia Generic Timers (10 nos)
WDT- 5 nos: 1 per FCPU and 1 for COP(AVP)[LP
CPU doesn't have WDT?]
GIT- ARM CPU Generic Timers(4 timers per CPU:
Secure & Non Secure Physical Timers; Hypervisor
Timer and Virtual timer)
TSC-Generic Time System Counter- reference for
GIT. Its a part of PMC
Note: any timer can be used as WDT
32. Power On Reset(POR)- deasserted externally
(SYS_RESET_N pin)
Reset by thermal Sensor
Watchdog Timer-Two types:
Deadman Timer(legacy) WDT-1st expiry interrupt issued
and on 2nd reset but only some subunits
WDT2- 1st expiry interrupt issued, on 2nd FIQ, on 3rd
CPU reset, on 4th full system reset
Software reset- Config bit in PMC; resets whole chip
LP0 wakeup reset- PMC logic controlled
33. During POR or system reset, reset controller
deasserts boot blocks first and then the CPU and
COP after 511 osc. clock periods to prevent
COP/CPU from talking to these boot devices while
itself still being in reset state
Non boot devices are brought into operation from
reset by software
At POR bits of registers
RST_DEVICES_L/H/U/V/W/X and
CLK_OUT_ENB_L/H/U/V/W/X are set by
hardware(pg 90 of TRM)
PORPOR
34. Blocks necessary for the boot are:
AVP with its L1
All systems buses like AHB, APB etc
Timer
RTC
NOR flash controller
eFUSE
GPIO
CoreSight- debug controller; one per cluster
38. Power States..contd.
LP2
Cluster switch (a variant of LP2)
- Cluster1 to 0 switch
-Cluster0 to 1 switch :CPU3 ie last of cluster0
initiates this switch
39. Power States..contd.
LP3(per CPU)
If CPU is idle for a short time its clock is ungated
ie CPU is halted( we have not pwr gated this CPU
only clk is stopped)
Only small wake up logic clk is enabled, others
ungated
LP3 exited on detection of IRQ or FIQ
Flow controller not needed, clk gating/ungating
internal to FCPUs and LPCPU
40. AVP Low Power States
No specific instruction to halt the AVP
However, its memory bus can be put into WAIT
state by flow controller (HALT State)
IRQ/FIQ and other wake events can bring AVP
out of halt state
During halt, AVP clk is automatically ungated by
hardware
AVP is NOT power gated
42. PMC....contd.
Provides interface to external PMIC
Controls votage switching/transitions as processor
changes power states(eg LP0, LP1)
Processes power/clock requests( acts as slave)
from various peripherals
To speed up operation, the PMC register file
operates in local peripheral interface bus domain
(APB) rather than in the 32KHz clock domain used
for PMC processing
43. Flow Controller-
IMPORTANT*
Provides sequencing of hardware controlled CPU
power states
Handles switching between CPU clusters 0 & 1
and also switching them OFF
Receives CPU pwr state requests from CPUs,
sends pwr ON/OFF requests to PMC which power
gates/ungates corresponding CPUs
Monitors per CPU interrupts and events to
determine CPU wake events
Initiates CPU wake
WFI(wait for interrupt) command used to trigger
low power states
45. Flow Controller....contd.
Note:
Flow controller has 3 different state machines-
* Main CPU flow controller state machines shown in
fig. above
* CPU rail power UP state machine
* State machine for COP
Flow controller uses CPU-ID (in MPID register) to
identify the cores
46. Power Architecture
There are sense pins for various system voltage
domains which access then continuously
47. Power Gating and Ungating
For CCPLEX PG partitions, sequencing ensured
by hardware when power gating is done via flow
controller
For SoC(non CCPLEX) PG partitions, sequencing
is done by software
Power gating controller- two in number
1. SoC PG controller
2. GPU PG controller
48. SoC PG Controller
Controls 8 zones and uses a fixed power ON/OFF
sequence using a fixed set of delays
Power OFF seq. is opposite of power ON
Same programming register for all zones
49. GPU PG Controller
GPU PG controlled by GPMU unit inside Kepler
GPU
Independent of SoC/CPU PG
If CPU and GPU share the same voltage rail (for
cost reduction), then software settings should
ensure that simultaneous PG of CPU and GPU
should not occur to avoid di/dt issues
50. Fast CPU PG COntroller
Used to power gate fast CPU partitions
Funtioning similar to SoC PG controller
51. Power Gating
Flow controller uses seperate state machine for
PG each CPU
PG done based on CPU-ID
Only one request handled at a time to avoid pwr
noise issues
Flow controller - PMC inerface has core ID and
not Cluster ID
As shown in figure, CPU and non CPU
components can be PG seperately
53. Power Gating....contd.
At boot,CPU rail is OFF by default. It can be
enabled by AVP using register write to PMC
registers
CPU rail can also be switched ON by PMIC (I2C
write)
COP can switch OFF the FCPUs
CPU and non CPU blocks cannot be switched
simultaneously
60. Peripherals- MIPI CSI 2.0
2 CSI interfaces, each supports upto 4 lanes
2 image sensors can be used simultaneously (eg
stereo apps.)
CSI B can support one additional single lane input