Intel Processor Trace (or Intel PT) is an processor extension for IA64 and IA32. The extension captures how a program got executed in machine-instruction level. All dynamic events, such as, branches, calls and interrupts, are recorded. This allows perfect reconstruction of previous execution by a trace analyzer.
This slide summarizes which data is generated out from this extension.
1. Processor Trace
WHAT ARE RECORDED?
Pipat Methavanitpong
+PipatMethavanitpong
@fulcronz27
2. Foreword
This work is done solely by myself without support from Intel
Information in this document is derived from
IA64 System Programming Manual – Chapter 36
Some are from my understanding
Mistakes or wrong information may appear
I am willing to update and correct errata
Please contact me via Google Hangout
I am not responsible for damage using this document
3. Objective
Give summary of data generated from Intel PT
Include relationships between data types partially
Not include its mechanism and controlling
4. PT Overview
Machine instruction-level tracing
Use dedicated hardware to trace
Convention uses software to trace software
Bird eye view observation
Can fully reconstruct execution at Analyze time
Record events that cannot be refer solely from binary
Usage
Low-level debugging
Fine tuning performance
State recovery
5. Background
At the lowest level programs are chunks of machine instructions
Processor executes machine instructions in sequential fashion
Processor does not execute in sequence when
Executing a redirecting machine instruction
Handling an interrupt or an exception (asynchronous event)
…
Execution context may be changed
Changing execution mode
Page switching
…
6. Pros and Cons
Pros
Finest grain in software tracing
Machine instruction level
Cons
Design overhead
Additional hardware
Man-picked dynamic events
May miss some categories
Hard to change
Hardware implementation
*My own opinion
7. Packet Types
1. Packet Stream Boundary – Interval beats, Sync point for analyzer
2. Taken Not-Taken – Conditional branch decision
3. Target IP – Target address within program binary
4. Flow Update Packets – Target address outside program binary (async events)
5. Paging Information Packet – Modification to CR3 task page base address
6. Time-Stamp Counter – Wall clock data
7. MODE – Execution mode
8. Core Bus Ratio – Bus clock ratio
9. Overflow – Internal buffer overflow
10. PAD – Alignment purpose
8. Packet Summary
PIP MODE CBR
Execution
PSB OVF
Processor Trace Packets
TNT TIP
Inside traced program
Redirection
FUP
Outside
traced
program
Environment
Trace
Alignment
Misc
PAD
TSC
Time
*does not imply packet combination
9. Taken Not-Taken (TNT)
A group of binary decisions
2 types of event
Conditional branch
Taken(1) / Not taken(0)
Unmodified return address
Taken(1)
2 sizes
Short TNT – 8-bit packet contains 6 decision bits
Long TNT – 64-bit packet contains 47 decision bits
No need to fill all the bits
Partial TNT when generates other packets in the middle
Decision
Taken (1) Not Taken (0)
10. Target IP (TIP)
A destination address within traced program
Used for
Indirect jump / call – use an address from a register or memory
Modified return address – return address on a call stack is modified
Has different packet signature from FUP
Has 2 extra variants
TIP.PGE – Packet Generation Enable
TIP.PGD – Packet Generation Disable
11. Flow Update Packet (FUP)
A destination address outside a traced program
Generated when asynchronous event happens
External interrupts
Exceptions and faults
X instructions
#SMI
WRMSR that clears TraceEn (one of flags that control tracing operation)
Generated in combination with other packets (not talked here)
Has different packet signature from TIP
12. Page Information Packet (PIP)
Keep track of page information
Current linear address range
CR3 register contains task’s page base address
Generated when CR3 is modified
Has exceptional cases
13. MODE packet
Record of processor modes that affect
Execution behavior
Analyze operation
2 modes are recorded
Execution modes
16- / 32- / 64-bit
TSX transaction operations
Begin / commit / abort
Either HLE or RTM
14. Core Bus Ratio (CBR) Packet
Tells current core:bus ratio
Cannot tell CBR change starts affecting which IP
Generated when
CBR changes
As a part of PSB+
15. Packet Stream Boundary (PSB)
Generated every 4k traces
Like heartbeats for trace operation
Analyzer searches for this packet first to start decoding
PSB itself does not contains any information
Just pure binary signature
Generated in combination with other packets
A whole pack is called PSB+
Tells current execution environment
16. Overflow (OVF) Packet
Generated when
PT happens to overflow its internal buffer
Analyzer skips to next FUP or TIP.PGE
17. PAD
Simply padding
No information contained
Improve packet-alignment
Or some implementation reasons