Session01_Intro.pdf

A Class in Eight Sections
 Introduction, history, computers and CPUs
 Memory
 Operating systems and process basics
 Responder training (Kent – 3 sessions)
 Approach to forensic analysis
 Case study – stepping through real malware

History
 Hacking has always followed invention
 1876 - Bell demonstrates the telephone
 1878 - teenagers try to take it apart
 ~1971 - phone phreaking starts, hacking follows
 1974 - unknown 15-yo teenager acquires
privileged access to CSUS computers
- To chance a view of the future, you must
understand the path which it used.

Some Numbers
 2015 - $3.3B was invested in 229 startups
 2017 – 780K jobs with 350K openings
 2021 – 3.5million job openings (estimated)
 Roughly ~250,000 unique pieces of Windows
malware appear every day
 Cyber security will be a growth industry
because there is too much money in it for all
involved

Two Possible Futures
1. All the “bad guys” decide “it’s
just too much trouble and
give up”
2. They just keep coming and
getting more sophisticated

Uncomfortable Silence Slide
Waiting for all the
people who chose #1
to leave the room.

Numerology
 8-bits think 256 (or 0x100)
 16-bits think 64K (or 0x10000)
 32-bits think 4G (or 0x100000000)
 1M think 1M (or 0x100000)
 All numbering systems start at 0
 Only difference between signed and
unsigned values is semantics
 1M is 1048576 not 1,000,000
 Know hex like you have 16 fingers

Old Microcomputers
CPU
Memory - RAM
I/O

The First Days (sorta)
 CPU dealt with 8-bits at a time
 Address was 16-bits, so <= 64Kbytes
 Bus supported was 16-bit address, 8-bit data
 I/O was completely separate operation
 I/O address was 8-bits
 4MHz bus clock
 Some manufacturers attached those signals
to a connector called a bus
 S-100,Apple,STD,SS-50, etc

IBM PC Microcomputers
CPU
Memory - RAM
I/O
Bus became a
de facto
standard

The Second Days
 CPU dealt with 8-bits at a time
 8088, 7-byte prefetch, really an 8-bit processor
 Address = 20-bits so 1M maximum
 Bus supported 20-bit address, 8-bit data
 I/O was 16-bit address and 16-bit data
 First bus masters appeared
 6MHz bus clock

x86 Not Orthogonal
 Orthogonal means that any register can
be used for any operation
 Not orthogonal means that registers
have specific tasks that the other
registers cannot perform

16-bit Registers
 AX – fastest, used in most opcodes
 BX – pointer, used in some opcodes
 CX – counter, used in some opcodes
 DX – sometimes extension of AX
 32-bit number was placed in DX:AX with DX
being the most significant 16-bits and AX
being the least significant 16-bits

More 16-bit Registers
 DI – general purpose & destination pointer
 SI – general purpose & source pointer
 SP – stack pointer
 BP – general purpose, pointer & used for
stack frame
 F – flags, directly used with stack or AH
 The difference
 AX, BX, CX, DX have one byte subregisters
 AH/AL, BH/BL, CH/CL, DH/DL

The Opcode
 The opcode is a set of numbers that the
tell the CPU what to do
 0x41 means add 1 to register CX
 0x6B 0xC9 0x05 means CX = CX * 5
 Think of the opcode as a verb (action)
 Think of memory and registers as nouns
 The opcode operates on nouns

Opcode Structure
 All assembly language follows:
<opcode> <v1> [,<v2> [,<v3> […]]]
or
verb noun1, noun2, …
 Opcodes have a target, explicit/implied
 Opcodes can have 0 to many sources

Opcode Targets
 Implicit: something in the CPU
 SAHF, CLI, HLT
 Explicit: register, memory
 mov ax, 3
 mov [memory_variable], dx

Opcode Sources
 Implicit: something in the CPU
 LAHF – load AH with the flags
 PUSHF – store the flags on the stack
 Explicit: register, memory, value
 mov ax, bx
 mov cx, [some_memory_variable]
 mov dx, 45

x86 Op Codes
 x86 currently has 981 unique opcodes
 Compilers use ~25 opcodes 99.9% of
the time
 Assembly language is like any other, just
think in smaller steps
 Ones you should know:
 mov, push, pop, jmp(s), call, cmp, add, sub
 or, and, xor, inc, dec, test, shl, shr, ror, rol
 and the ones that look like them

A Quick Opcode Eye Chart
mov : copies data
push/pop : stack in and out
jmp/call : goto or a function call
cmp : compares two values
add/sub/mul/div : math operators
and/or/xor/not : logical operators
inc/dec : ++ and - -
shl/shr/ror/rol : bit shifting/rotating

Addressing Modes
 CPU has to access memory
 Addressing modes you should know
 Immediate: mov ax, A_VALUE
 Direct: mov ax, memory_location
 Indirect: mov ax, [bx]
 Indirect+offset: mov ax, [bx + A_VALUE]
 Indirect scaled: mov ax, [bx*4]
 Combined: mov ax, [bx*4] + A_VALUE

Segment Registers
 Used to reference a 16-byte location in
memory (e.g. segment 2 is address 32)
 CS – code segment (ip)
 DS – data segment (bx, si, di)
 SS – stack segment (sp, bp)
 ES – extra segment (di for string
opcodes)

How are Segments Used?
0
1
2
3
4
5
FFFB
FFFC
FFFD
FFFE
FFFF
0x00000
0x00060
0xFFFB0
0xFFFF0
DS == 0x0002
…
…
DS:0x0037 is address 0x00057
0x20 from DS being 2
+ 0x0037
= 0x00057
So, (segment number * 16) + offset
is the physical address.
Memory
Segments
1 megabyte of memory is divided
into 64K segments of 16-bytes each
Addresses

Segment Overrides
 Normally pointer registers use certain
segments
 DS – data segment (bx, si, di)
 An override can be used to have a
pointer use another segment instead
 es:[bx] means use ES not DS

Memory Access
M
e
m
o
r
y
0
1 meg-O-byte!
DS
64K range offset from DS by BX, DI, or SI
CS
64K range offset from CS by IP
SS
64K range offset from SS by BP or SP
ES

How to CPUs Store Data
0x12345678
Little Endian (Intel, Arm)
0x78
0x56
0x34
0x12
+0
+1
+2
+3
0x12345678
Big Endian (Motorola, PowerPC, Arm)
0x12
0x34
0x56
0x78
+0
+1
+2
+3
Most modern embedded CPUs allow you to choose the endianness

Next Step: 32-bit PCs
CPU Memory
I/O
B
u
s

32-bit Land
 CPU dealt with 16 or 32-bits at a time
 Address was 32-bits
 I/O was 16-bit address and 16-bit data
 Registers became more orthogonal
 Real, protected and V86 modes
 real mode:16-bit, protected mode:32-bit
 i386 had cache controller but no cache
 I never saw a single system with one installed

Register Name Changes
AX -> EAX
BX -> EBX
CX -> ECX
DX -> EDX Well that’s
DI -> EDI exciting!
SI -> ESI
SP -> ESP
BP -> EBP

New Segment Registers
CS – code segment (eip)
DS – data segment
(eax,ebx,ecx,edx,esi,edi)
SS – stack segment (esp, ebp)
ES – extra segment (edi for strings)
FS - ??? eff segment?
GS - ??? gee segment?

Question
If pointers in 32-bit CPUs are 32 bits, what
do we need segment registers for?

Answer
 They are no longer used for 16-byte
segments
 They have new properties that define
where in physical memory they start
 They provide the first taste of virtual
memory

32-bit Segment Register Usage
M
e
m
o
r
y
DS describes address and size of data area
CS describes address and size of code area

32-bit Segment Register Usage
M
e
m
o
r
y
CS
DS VMEM data location 0 is here
VMEM code location 0 is here
PHYSMEM VMEM

Example: Windows Today
M
e
m
o
r
y
CS
DS
PHYSMEM VMEM
FS
Special task
specific information

New Term: Superscaling
 Superscaling allows a CPU to process
two opcodes in a single cycle
 If a CPU could process two opcodes in a
cycle, then it needed to have opcodes
twice as fast
 The opcodes can’t be dependent upon
each other
 Leads to interesting opcode placement
by compilers

Why Wasn’t DRAM Good Enough?
CPU Byte Address DRAM
CPU DRAM
Get a byte
Some time later

Superscaling Led to Caching
 In order to make simultaneous opcode
execution viable, a larger prefetch was
required (e.g. caching)
 First showed up in the i486 for certain
pairs of opcodes

Caches
 Very fast, expensive static RAM built into
the CPU
 Must operate at twice the speed of the CPU
 Different layers, L1, L2, maybe even L3
 Each layer is faster than the one above
 L1 faster than L2 faster than L3, etc

Cache Architecture
CPU Core
L1 Cache
L2 Cache
DDR
Internal Bus

Caching Led to Page Mode DRAM
 Full cache lines pulled in from RAM
rather than single words
 Addressing by cache lines reduced the
number of pins required for DDR

Current DDR Access
CPU Line Address DDR
CPU DDR
Full cache line
Some time later

Faster Systems -> Faster Bus
 PCI – 32-bit open specification
 Microchannel – 32-bit IBM proprietary
 Both attempted to become the true
standard. PCI was free and
Microchannel cost $1000’s to license

PCI Bus
 32-bit physical addressing
 32-bit data
 Designed to support multiple masters
 I/O mapped addressing -> memory mapped
 33MHz bus clock (133Mbyte throughput)

Bus Masters
 Virtually all PCI devices are bus masters
 Effectively a separate computer
 No access to the CPU’s cache

PCI Bus Architecture
CPU Memory
Device
B
u
s
Device
Device
I/O
PCI
Bus
Masters

PCI Led to Memory Structure
 Bus masters operate on RAM directly
 CPU and PCI accessing same thing is bad
news
 Bus master buffers are cache line aligned
 Bus master structures are aligned as well
 PCI has 32-bit addressing limit so < 4GByte
 PCI only deals with physical addressing so
there is no security

Memory Contention
CPU Core
L1 Cache
L2 Cache
DDR
Internal Bus
PCI Device
Drivers understand this problem
and structure themselves
accordingly.

PCI Issues
 Parallel interface has several pins
 Speed of light becomes a factor when
multiple high speed signals need to
reach their goal at the same time
 At high speed, a trace becomes a
memory device

PCIe
 High speed serial interface
 Far fewer pins
 Full 64-bit address range
 Version 1, 2.5GHz per lane
 Version 2, 5GHz per lane
 Version 3, 8GHz per lane
 etc

Legacy
 64-bit addressing, but structures still stay
below 4G
 Still deals with physical memory addresses
 Has no security

64-bit
 rax, rbx, rcx, rdx, rdi, rsi, rbp, rsp
 Plus r8 – r15
 Virtual address range from 256TB to 16PB
 Physical address range from 1TB (40 bits) to
256 TB (48 bits)
For the remainder of this series, I’ll refer to the
32-bit registers, but all can be 64-bit extended

Multicore
CPU Core
1
L1 Cache
L2 Cache
DDR
Internal Bus
CPU Core
2
L1 Cache
CPU Core
3
L1 Cache
CPU Core
n
L1 Cache
...

Protection Rings
 Intel has four security rings: 0 – 3
 Ring 0 has full access to all opcodes
 Ring 3 has limited access to opcodes
and certain memory
 Drivers and OS run in ring 0/1
 User software runs in ring 3

Problem for You to Think About
 In a 16-bit, x86 computer, a segment
register is used as a base of a 16-byte
offset. So, ES = 0x1000, would be
based at the memory location 0x10000.
In a system with 1Mbyte of RAM (max
address location 0x100000), what would
happen if you load ES with 0xFFFF and
BX with 0x400 and then execute the
instruction: mov ax, es:[bx]?

Real World Problem
 You created a 64-bit operating system.
You found that the size of your
executables almost doubled in size. You
found that this also caused the
programs to run slower because the
increased size was a burden on the
CPU cache.
What would you do to fix that?

Session01_Intro.pdf

Recommended

Recommended

More Related Content

Similar to Session01_Intro.pdf

Similar to Session01_Intro.pdf (20)

Recently uploaded

Recently uploaded (20)

Session01_Intro.pdf