The document provides an overview of Chapter 2 from the textbook "Assembly Language for Intel-Based Computers, 5th Edition". It discusses the IA-32 processor architecture, including the basic microcomputer design, instruction execution cycle, memory management in real-address and protected modes, and the history of Intel microprocessors. It also previews the topics that will be covered in the rest of the chapter, such as memory management, computer components, and input/output systems.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
This document discusses the CPU structure and function based on William Stallings' Computer Organization and Architecture textbook. It covers the following key points in 3 or fewer sentences:
The CPU must fetch instructions, interpret instructions, fetch data, process data, and write data. The CPU contains registers for temporary storage and processing, including general purpose, data, address, and condition code registers. The document discusses CPU internal structures like control and status registers, the program status word, and register organizations for CPUs like Intel processors and PowerPC.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
The document discusses power management in ARMv8-A and the integration of OP-TEE with the ARM Trusted Firmware. It provides an overview of the software stack and PSCI requirements. It then describes OP-TEE's system view and how it integrates with ARM Trusted Firmware as a runtime service. Finally, it discusses the programmer's view of PSCI and provides examples of how CPU_ON, CPU_OFF, and CPU_SUSPEND operations are handled between Linux, ARM Trusted Firmware, and OP-TEE.
This document presents information on assembly level programming. It discusses how assembly level programming uses mnemonics that are assembled into object code, making it less error-prone than machine level programming. It also outlines the programming tools used for assembly level programming, including editors, assemblers, linkers, loaders, and debuggers. Finally, it notes that assembly level programming is commonly used in embedded systems with limited resources.
The document provides an overview of the initialization process in the Linux kernel from start_kernel to rest_init. It lists the functions called during this process organized by category including functions for initialization of multiprocessor support (SMP), memory management (MM), scheduling, timers, interrupts, and architecture specific setup. The setup_arch section focuses on x86 architecture specific initialization functions such as reserving memory regions, parsing boot parameters, initializing memory mapping and MTRRs.
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
Haodong Tang from Intel gave this talk at the 2018 Open Fabrics Workshop.
"Efficient network messenger is critical for today’s scale-out storage systems. Ceph is one of the most popular distributed storage system providing a scalable and reliable object, block and file storage services. As the explosive growth of Big Data continues, there're strong demands leveraging Ceph build high performance & ultra-low latency storage solution in the cloud and bigdata environment. The traditional TCP/IP cannot satisfy this requirement, but Remote Direct Memory Access (RDMA) can.
"In this session, we'll present the challenges in today's distributed storage system posed by network messenger with the profiling results of Ceph All Flash Array system showing the networking already become the bottleneck and introduce how we achieved 8% performance benefit with Ethernet RDMA protocol iWARP. We'll first present the design of integrating iWARP to Ceph networking module together with performance characterization results with iWARP enabled IO intensive workload. The send part, we will explore the proof-of-concept solution of Ceph on NVMe over iWARP to build high-performance and high-density storage solution. Finally, we will showcase how these solutions can improve OSD scalability, and what’s the next optimization opportunities based on current analysis."
Watch the video: https://wp.me/p3RLHQ-ikV
Learn more: http://intel.com
and
https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses the Intel Core i7 processor. It is a quad-core desktop processor that uses Intel's Nehalem microarchitecture. A quad-core processor has four processing cores that allow more tasks to be performed simultaneously. Core i7 processors are 64-bit and have features like Turbo Boost technology, Hyper-Threading, and integrated graphics. They have multiple decoder units that can decode instructions in parallel to improve performance.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
This document discusses the CPU structure and function based on William Stallings' Computer Organization and Architecture textbook. It covers the following key points in 3 or fewer sentences:
The CPU must fetch instructions, interpret instructions, fetch data, process data, and write data. The CPU contains registers for temporary storage and processing, including general purpose, data, address, and condition code registers. The document discusses CPU internal structures like control and status registers, the program status word, and register organizations for CPUs like Intel processors and PowerPC.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
The document discusses power management in ARMv8-A and the integration of OP-TEE with the ARM Trusted Firmware. It provides an overview of the software stack and PSCI requirements. It then describes OP-TEE's system view and how it integrates with ARM Trusted Firmware as a runtime service. Finally, it discusses the programmer's view of PSCI and provides examples of how CPU_ON, CPU_OFF, and CPU_SUSPEND operations are handled between Linux, ARM Trusted Firmware, and OP-TEE.
This document presents information on assembly level programming. It discusses how assembly level programming uses mnemonics that are assembled into object code, making it less error-prone than machine level programming. It also outlines the programming tools used for assembly level programming, including editors, assemblers, linkers, loaders, and debuggers. Finally, it notes that assembly level programming is commonly used in embedded systems with limited resources.
The document provides an overview of the initialization process in the Linux kernel from start_kernel to rest_init. It lists the functions called during this process organized by category including functions for initialization of multiprocessor support (SMP), memory management (MM), scheduling, timers, interrupts, and architecture specific setup. The setup_arch section focuses on x86 architecture specific initialization functions such as reserving memory regions, parsing boot parameters, initializing memory mapping and MTRRs.
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
Haodong Tang from Intel gave this talk at the 2018 Open Fabrics Workshop.
"Efficient network messenger is critical for today’s scale-out storage systems. Ceph is one of the most popular distributed storage system providing a scalable and reliable object, block and file storage services. As the explosive growth of Big Data continues, there're strong demands leveraging Ceph build high performance & ultra-low latency storage solution in the cloud and bigdata environment. The traditional TCP/IP cannot satisfy this requirement, but Remote Direct Memory Access (RDMA) can.
"In this session, we'll present the challenges in today's distributed storage system posed by network messenger with the profiling results of Ceph All Flash Array system showing the networking already become the bottleneck and introduce how we achieved 8% performance benefit with Ethernet RDMA protocol iWARP. We'll first present the design of integrating iWARP to Ceph networking module together with performance characterization results with iWARP enabled IO intensive workload. The send part, we will explore the proof-of-concept solution of Ceph on NVMe over iWARP to build high-performance and high-density storage solution. Finally, we will showcase how these solutions can improve OSD scalability, and what’s the next optimization opportunities based on current analysis."
Watch the video: https://wp.me/p3RLHQ-ikV
Learn more: http://intel.com
and
https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses the Intel Core i7 processor. It is a quad-core desktop processor that uses Intel's Nehalem microarchitecture. A quad-core processor has four processing cores that allow more tasks to be performed simultaneously. Core i7 processors are 64-bit and have features like Turbo Boost technology, Hyper-Threading, and integrated graphics. They have multiple decoder units that can decode instructions in parallel to improve performance.
Linux has become integral part of Embedded systems. This three part presentation gives deeper perspective of Linux from system programming perspective. Stating with basics of Linux it goes on till advanced aspects like thread and IPC programming.
An unique module combining various previous modules you have learnt by combing Linux administration, Hardware knowledge, Linux as OS, C/Computer programming areas. This is a complete module on Embedded OS, as of now no books are written on this with such practical aspects. Here is a consolidated material to get real hands-on perspective about building custom Embedded Linux distribution in ARM.
This document discusses SR-IOV (Single Root I/O Virtualization) in ACRN. It begins with an introduction to SR-IOV, describing how it allows PCIe devices to be isolated and have near bare-metal performance through the use of Physical Functions (PFs) and Virtual Functions (VFs). It then outlines the SR-IOV architecture in ACRN, including how it detects and initializes SR-IOV devices, assigns VFs to VMs, and manages the lifecycle of VFs. Finally, it provides an agenda for an SR-IOV demo using an Intel 82576 NIC and concludes with a Q&A section.
This document discusses memory ordering and synchronization in multithreaded programs. It begins with background on mutexes, semaphores, and their differences. It then discusses problems that can occur with locking-based synchronization methods like mutexes, such as deadlocks, priority inversion, and performance issues. Alternative lock-free programming techniques using atomic operations are presented as a way to synchronize access without locks. Finally, memory ordering, consistency models, barriers, and their implementations in compilers, Linux kernels, and ARM architectures are covered in detail.
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
This document discusses evaluating a UCIe-based multi-die system-on-chip (SoC) using system modeling to meet timing and power constraints. It provides an overview of UCIe and how it can be used to connect multiple dies. It then describes assembling a system model in VisualSim Architect using UCIe components to analyze configurations and optimize latency, bandwidth, and power. Examples of multi-media and automotive applications using UCIe-based chiplet designs are also presented.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
This document introduces OpenCL, a framework for parallel programming across heterogeneous systems. OpenCL allows developers to write programs that access GPU and multi-core processors. It provides portability so the same code can run on different processor architectures. The document outlines OpenCL programming basics like kernels, memory objects, and host code that manages kernels. It also provides a simple "Hello World" example of vector addition in OpenCL and recommends additional resources for learning OpenCL.
This document discusses storage management and disk structure. It covers mass storage structure including magnetic disks, disk platters, tracks, cylinders, sectors, and read/write heads. It then discusses disk structure in operating systems and concepts like surfaces, tracks, cylinders, and read/write heads. Finally, it covers scheduling and management techniques like long term, short term, and medium term schedulers as well as RAID structures and their benefits like increased reliability and performance.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
This document discusses debugging the ACPI subsystem in the Linux kernel. It provides an overview of the ACPI subsystem and components like ACPICA and the namespace. It describes how to enable ACPI debug logging via the acpi.debug_layer and acpi.debug_level kernel parameters. It also covers overriding ACPI definition blocks tables and tracing ACPI temperature as a debugging case study.
The document discusses Linux support for the ARM 64-bit (AArch64) architecture. It covers key aspects of the AArch64 instruction set like 64-bit registers and memory accesses. It describes the exception model with multiple privilege levels and modes for virtualization. It also summarizes the Linux kernel port to AArch64 including boot process, memory management support, and compatibility for 32-bit applications. Future work is outlined to improve platform support and add new features to the AArch64 version of Linux.
Yocto - Embedded Linux Distribution MakerSherif Mousa
Yocto is an Embedded Linux distribution maker.
This presentation is a quick start guide for Yocto buildsystem to get familiar with the tool and how to start building your own custom Linux system for a specific hardware target.
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
The document is the IA-32 Intel Architecture Software Developer's Manual Volume 1: Basic Architecture. It consists of three volumes that cover the basic architecture, instruction set, and system programming. This volume provides an overview of IA-32 processors, execution environment, memory organization, registers, addressing modes, data types, and instruction set architecture. It is intended for software developers evaluating the IA-32 architecture.
Linux has become integral part of Embedded systems. This three part presentation gives deeper perspective of Linux from system programming perspective. Stating with basics of Linux it goes on till advanced aspects like thread and IPC programming.
An unique module combining various previous modules you have learnt by combing Linux administration, Hardware knowledge, Linux as OS, C/Computer programming areas. This is a complete module on Embedded OS, as of now no books are written on this with such practical aspects. Here is a consolidated material to get real hands-on perspective about building custom Embedded Linux distribution in ARM.
This document discusses SR-IOV (Single Root I/O Virtualization) in ACRN. It begins with an introduction to SR-IOV, describing how it allows PCIe devices to be isolated and have near bare-metal performance through the use of Physical Functions (PFs) and Virtual Functions (VFs). It then outlines the SR-IOV architecture in ACRN, including how it detects and initializes SR-IOV devices, assigns VFs to VMs, and manages the lifecycle of VFs. Finally, it provides an agenda for an SR-IOV demo using an Intel 82576 NIC and concludes with a Q&A section.
This document discusses memory ordering and synchronization in multithreaded programs. It begins with background on mutexes, semaphores, and their differences. It then discusses problems that can occur with locking-based synchronization methods like mutexes, such as deadlocks, priority inversion, and performance issues. Alternative lock-free programming techniques using atomic operations are presented as a way to synchronize access without locks. Finally, memory ordering, consistency models, barriers, and their implementations in compilers, Linux kernels, and ARM architectures are covered in detail.
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
This document discusses evaluating a UCIe-based multi-die system-on-chip (SoC) using system modeling to meet timing and power constraints. It provides an overview of UCIe and how it can be used to connect multiple dies. It then describes assembling a system model in VisualSim Architect using UCIe components to analyze configurations and optimize latency, bandwidth, and power. Examples of multi-media and automotive applications using UCIe-based chiplet designs are also presented.
Multithreading allows exploiting thread-level parallelism (TLP) to improve processor utilization. There are several categories of multithreading:
- Superscalar simultaneous multithreading interleaves instructions from multiple threads within a single out-of-order processor core to reduce idle resources.
- Coarse-grained multithreading switches between threads on long-latency events like cache misses to hide latency.
- Fine-grained multithreading interleaves threads at a finer instruction granularity in in-order cores.
- Multiprocessing physically separates threads onto multiple processor cores.
This document introduces OpenCL, a framework for parallel programming across heterogeneous systems. OpenCL allows developers to write programs that access GPU and multi-core processors. It provides portability so the same code can run on different processor architectures. The document outlines OpenCL programming basics like kernels, memory objects, and host code that manages kernels. It also provides a simple "Hello World" example of vector addition in OpenCL and recommends additional resources for learning OpenCL.
This document discusses storage management and disk structure. It covers mass storage structure including magnetic disks, disk platters, tracks, cylinders, sectors, and read/write heads. It then discusses disk structure in operating systems and concepts like surfaces, tracks, cylinders, and read/write heads. Finally, it covers scheduling and management techniques like long term, short term, and medium term schedulers as well as RAID structures and their benefits like increased reliability and performance.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
This document discusses debugging the ACPI subsystem in the Linux kernel. It provides an overview of the ACPI subsystem and components like ACPICA and the namespace. It describes how to enable ACPI debug logging via the acpi.debug_layer and acpi.debug_level kernel parameters. It also covers overriding ACPI definition blocks tables and tracing ACPI temperature as a debugging case study.
The document discusses Linux support for the ARM 64-bit (AArch64) architecture. It covers key aspects of the AArch64 instruction set like 64-bit registers and memory accesses. It describes the exception model with multiple privilege levels and modes for virtualization. It also summarizes the Linux kernel port to AArch64 including boot process, memory management support, and compatibility for 32-bit applications. Future work is outlined to improve platform support and add new features to the AArch64 version of Linux.
Yocto - Embedded Linux Distribution MakerSherif Mousa
Yocto is an Embedded Linux distribution maker.
This presentation is a quick start guide for Yocto buildsystem to get familiar with the tool and how to start building your own custom Linux system for a specific hardware target.
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
The document is the IA-32 Intel Architecture Software Developer's Manual Volume 1: Basic Architecture. It consists of three volumes that cover the basic architecture, instruction set, and system programming. This volume provides an overview of IA-32 processors, execution environment, memory organization, registers, addressing modes, data types, and instruction set architecture. It is intended for software developers evaluating the IA-32 architecture.
The document provides an overview of the evolution of Intel microprocessors from 8-bit to 64-bit architectures over several decades. It describes key processors such as the 8086, 80286, 80386 which introduced 32-bit architecture and protected mode. Subsequent processors increased performance through features like pipelining, superscalar, and integration of memory controllers and caches on-die. Modern multi-core processors support 64-bit architecture and virtualization.
There are two main types of network addresses: physical (MAC) and logical (IP). The physical MAC address is unique to each network interface card and is burned into the hardware. The logical IP address can be either static, requiring manual configuration, or dynamic, assigned automatically by a DHCP server. Both address types are needed for devices to communicate over a network.
The document discusses the IA-64 architecture, which was jointly developed by Intel and HP as a 64-bit architecture intended for implementation on EPIC processors like Itanium. Some key points:
- IA-64 uses explicit instruction-level parallelism specified by the compiler rather than relying on superscalar execution. It supports long instruction bundles containing multiple operations.
- The architecture has large register sets, multiple execution units, predicated execution, and support for software pipelining to further exploit instruction level parallelism.
- The Itanium processor implemented IA-64 using a superscalar design with hardware support for the EPIC features like predication and speculation.
The document provides an overview of x86 assembly language architecture, including:
1) It describes the basic components of an x86 microcomputer including the CPU, memory, input/output ports, and motherboard.
2) It explains x86 processor architecture concepts such as modes of operation, registers, addressing modes, and the evolution of Intel processors.
3) It covers x86 memory management in real mode and protected mode as well as paging and segmentation.
Physical and logical data independence are described. Logical data independence refers to the ability to change the logical schema without impacting external schemas or application programs. Physical data independence is the ability to change the physical schema without affecting the logical schema, such as changing storage structures or devices. Examples of each type of independence are provided. The key elements of an entity-relationship model - entities, attributes, and relationships - are defined. An example E-R diagram showing students, courses, and their relationship is depicted to illustrate these elements.
ARP (Address Resolution Protocol) maps logical IP addresses to physical MAC addresses. It works by broadcasting an ARP request packet containing the logical IP address, and the physical host with that IP will respond with its MAC address in an ARP reply packet. ARP packets are encapsulated within Ethernet frames to be transmitted at the data link layer, and ARP is used to resolve addresses both for hosts on the same local network and for traffic destined for a default router on another network.
This document provides an overview of the Intel x86 architecture, including its registers, instructions, memory management, interrupts and exceptions, task management, and input/output capabilities. It describes the basic execution environment including memory management registers and control registers. It explains the operation modes of protected mode and real mode, and the memory models. It also summarizes the general purpose instructions, system instructions, privilege levels, basic program execution registers, and memory addressing in the x86 architecture.
The 7 Cs of business writing are:
1. Completeness - Answer all questions fully using the 5Ws and 1H.
2. Conciseness - Be focused and avoid unnecessary words.
3. Consideration - Focus on the reader's needs and use a positive tone.
4. Clarity - Use simple, familiar language and effective structure.
5. Concreteness - Provide specific details, facts, and vivid descriptions.
6. Courtesy - Be sincere, tactful and avoid language that could offend.
7. Correctness - Ensure accurate information and proper writing mechanics.
The document summarizes key aspects of x86 processor architecture. It describes the basic components of a microcomputer including the CPU, memory, and I/O devices. It then discusses the core components of the CPU like the ALU, registers, buses, and control unit. It explains the fetch-decode-execute instruction cycle and memory access cycles. It also covers advanced CPU features like cache memory, multitasking, addressing modes, and register types.
This document provides an overview of Chapter 1 from the book "Assembly Language for Intel-Based Computers" by Kip Irvine. The chapter introduces assembly language concepts such as virtual machine levels, data representation using binary and hexadecimal numbers, and boolean logic operations. It discusses translating between number systems, arithmetic, and character storage. The overview concludes by noting that subsequent sections will cover numeric data representation and boolean operations in more detail.
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...ssuser65bfce
This document provides an overview of Chapter 2 from the textbook "Assembly Language for x86 Processors". It discusses the basic concepts of computer hardware components including the CPU, memory, and input/output systems. It then describes the architecture of x86 processors, including their modes of operation, basic execution environment involving registers and memory, and instruction execution cycle. Examples of simple assembly language programs are provided.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
This document summarizes a presentation about Ceph, an open-source distributed storage system. It discusses Ceph's introduction and components, benchmarks Ceph's block and object storage performance on Intel architecture, and describes optimizations like cache tiering and erasure coding. It also outlines Intel's product portfolio in supporting Ceph through optimized CPUs, flash storage, networking, server boards, software libraries, and contributions to the open source Ceph community.
This document provides an overview of Chapter 1 from the textbook "Assembly Language for Intel-Based Computers, 5th Edition" by Kip Irvine. The chapter introduces basic concepts of assembly language including the objectives of learning assembly language, the virtual machine concept with different machine levels, data representation using binary and hexadecimal numbers, and boolean operations. It describes applications of assembly language and compares it to high-level languages. An outline of topics to be covered in the chapter is also provided.
Are you ready to work in the Parallel Universe? Rise to the challenge at SC13Intel IT Center
The document discusses optimization on Intel Xeon Phi coprocessors. It begins by comparing the peak performance and architecture of Xeon Phi coprocessors to Xeon processors, noting Xeon Phi has more cores, threads, and vector processing capabilities. It then outlines flexible execution models for running code on Xeon Phi, including offload and native modes. An example is shown of performance improvements from optimizing code for Xeon Phi. Upcoming "Knights Landing" Xeon Phi processors are discussed, which will integrate memory and run code natively.
The document discusses Intel tools for optimizing high performance computing (HPC) systems. It provides an overview of Intel Parallel Studio XE and Cluster Studio XE 2013 and what's new in the 2015 beta version. Specifically, it highlights updated support for latest Intel processors and coprocessors in the compilers, libraries, and tools. It also summarizes new features in the Intel C++ and Fortran Compiler, Intel Math Kernel Library, and redesigned optimization reports.
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
Jack Zhang is a Senior Enterprise Architect at Intel Corp. This document discusses Ceph storage configurations using Intel SSDs and discusses benchmark results. Tuning Ceph for all-flash storage can significantly improve performance, with up to 16x better random read performance and 7.6x better random write performance achieved. Using SSDs instead of HDDs provides much higher performance, needing 58x fewer drives for the same write performance and 175x fewer for the same read performance. The document also outlines several suggested Ceph storage node configurations using different ratios of SSDs and HDDs.
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...inwin stack
Kenny Chang (張任伯) (Storage Solution Architect, Intel)
With the trend that Solid State Drive (SSD) becomes more affordable, more and more cloud providers are trying to provide high performance, highly reliable storage for their customers with SSDs. Ceph is becoming one of most open source scale-out storage solutions in worldwide market. More and more customers have strong demands that using SSD in Ceph to build high performance storage solutions for their Openstack clouds.
The disrupted Intel® Optane SSDs based on 3D Xpoint technology fills the performance gap between DRAM and NAND based SSD while the Intel® 3D NAND TLC is reducing cost gap between SSD and traditional spindle hard drive and makes it possible for all flash storage. In this session, we will
1) Discuss OpenStack storage Ceph reference design on the first Intel Optane (3D Xpoint) and P4500 TLC NAND based all-flash Ceph cluster, it delivers multi-million IOPS with extremely low latency as well as increase storage density with competitive dollar-per-gigabyte costs
2) Share Ceph bluestore tunings and optimizations, latency analysis, TCO model, IOPS/TB, IOPS/$ based on the reference architecture to demonstrate this high performance, cost effective solution.
1) The document discusses Intel Core i5 processors, focusing on their architecture, specifications, and new features.
2) It describes the Core i5 lineup, including dual-core and quad-core models ranging from 2.9-3.2 GHz with 4-8 MB of L3 cache.
3) The key advantages of Core i5 over competing chips are discussed, such as larger cache sizes, integrated graphics, and Turbo Boost technology.
EScala - Design Platform that generates optimized, re-programmable HDL IP cores from Esencia Technologies, Inc.
Esencia is a leading Design Services Company, based in Santa Clara, CA
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
An all-NVMe Ceph cluster was configured with 5 storage nodes, each containing 20 Intel P3700 SSDs providing a total of 80 object storage daemons (OSDs). Benchmarking showed over 1.4 million 4K random read IOPS at an average latency of 1ms and 220K 4K random write IOPS at 5ms latency. For a 70/30% read/write mix, over 560K random IOPS were achieved at 3ms latency. Sysbench MySQL testing on the cluster showed linear scaling from 2 to 8 queries per second with an average I/O size of 16KB.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
0 foundation update__final - Mendy FurmanekYutaka Kawai
This slide was presented by Mendy Furmanek at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/9c/Final%20-%20Mendy%20F..pdf
This document summarizes Intel's Cluster Studio XE and Parallel Studio XE development tools. It provides an overview of the tools' capabilities for developing applications that can efficiently scale from few cores to many cores. The tools include compilers, performance profiling and analysis tools, and libraries that support parallel programming models and provide optimized math functions and algorithms.
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
This document discusses an all-NVMe Ceph cluster configuration for MySQL hosting. It describes a 5-node Ceph cluster with Intel Xeon processors, 128GB of RAM, and 20 Intel SSDs providing 80 object storage devices (OSDs) for a total effective capacity of 19TB. Benchmark results show the cluster achieving over 1.4 million IOPS for 4K random reads with an average latency of 1ms, and over 220K IOPS for 4K random writes with 5ms latency. Sysbench tests of MySQL databases on the cluster using 16KB IOs showed response times under 10ms for query depths from 2 to 8.
Similar to Chapt 02 ia-32 processer architecture (20)
Hands-on with Apache Druid: Installation & Data Ingestion StepsservicesNitor
Supercharge your analytics workflow with https://bityl.co/Qcuk Apache Druid's real-time capabilities and seamless Kafka integration. Learn about it in just 14 steps.
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceICS
This webinar explores the “secure-by-design” approach to medical device software development. During this important session, we will outline which security measures should be considered for compliance, identify technical solutions available on various hardware platforms, summarize hardware protection methods you should consider when building in security and review security software such as Trusted Execution Environments for secure storage of keys and data, and Intrusion Detection Protection Systems to monitor for threats.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
Photoshop Tutorial for Beginners (2024 Edition)alowpalsadig
Photoshop Tutorial for Beginners (2024 Edition)
Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."
Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
Photoshop Tutorial for Beginners (2024 Edition)Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
The importance of developing and designing programming in 2024
Programming design and development represents a vital step in keeping pace with technological advancements and meeting ever-changing market needs. This course is intended for anyone who wants to understand the fundamental importance of software development and design, whether you are a beginner or a professional seeking to update your knowledge.
Course objectives:
1. **Learn about the basics of software development:
- Understanding software development processes and tools.
- Identify the role of programmers and designers in software projects.
2. Understanding the software design process:
- Learn about the principles of good software design.
- Discussing common design patterns such as Object-Oriented Design.
3. The importance of user experience (UX) in modern software:
- Explore how user experience can improve software acceptance and usability.
- Tools and techniques to analyze and improve user experience.
4. Increase efficiency and productivity through modern development tools:
- Access to the latest programming tools and languages used in the industry.
- Study live examples of applications
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
Hyperledger Besu 빨리 따라하기 (Private Networks)wonyong hwang
Hyperledger Besu의 Private Networks에서 진행하는 실습입니다. 주요 내용은 공식 문서인https://besu.hyperledger.org/private-networks/tutorials 의 내용에서 발췌하였으며, Privacy Enabled Network와 Permissioned Network까지 다루고 있습니다.
This is a training session at Hyperledger Besu's Private Networks, with the main content excerpts from the official document besu.hyperledger.org/private-networks/tutorials and even covers the Private Enabled and Permitted Networks.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...Ortus Solutions, Corp
Join us for a session exploring CommandBox 6’s smooth website transition and efficient deployment. CommandBox revolutionizes web development, simplifying tasks across Linux, Windows, and Mac platforms. Gain insights and practical tips to enhance your development workflow.
Come join us for an enlightening session where we delve into the smooth transition of current websites and the efficient deployment of new ones using CommandBox 6. CommandBox has revolutionized web development, consistently introducing user-friendly enhancements that catalyze progress in the field. During this presentation, we’ll explore CommandBox’s rich history and showcase its unmatched capabilities within the realm of ColdFusion, covering both major variations.
The journey of CommandBox has been one of continuous innovation, constantly pushing boundaries to simplify and optimize development processes. Regardless of whether you’re working on Linux, Windows, or Mac platforms, CommandBox empowers developers to streamline tasks with unparalleled ease.
In our session, we’ll illustrate the simple process of transitioning existing websites to CommandBox 6, highlighting its intuitive features and seamless integration. Moreover, we’ll unveil the potential for effortlessly deploying multiple websites, demonstrating CommandBox’s versatility and adaptability.
Join us on this journey through the evolution of web development, guided by the transformative power of CommandBox 6. Gain invaluable insights, practical tips, and firsthand experiences that will enhance your development workflow and embolden your projects.
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
The Comprehensive Guide to Validating Audio-Visual Performances.pdfkalichargn70th171
Ensuring the optimal performance of your audio-visual (AV) equipment is crucial for delivering exceptional experiences. AV performance validation is a critical process that verifies the quality and functionality of your AV setup. Whether you're a content creator, a business conducting webinars, or a homeowner creating a home theater, validating your AV performance is essential.
1. Assembly Language for Intel-BasedAssembly Language for Intel-Based
Computers, 5Computers, 5thth
EditionEdition
Chapter 2: IA-32 Processor
Architecture
(c) Pearson Education, 2006-2007. All rights reserved. You may modify and copy this slide show for your personal use,
or for use in the classroom, as long as this copyright statement, the author's name, and the title are not changed.
Slides prepared by the author
Revision date: June 4, 2006
Kip Irvine
2. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 2
Chapter OverviewChapter Overview
• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
3. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 3
General ConceptsGeneral Concepts
• Basic microcomputer design
• Instruction execution cycle
• Reading from memory
• How programs run
4. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 4
Basic Microcomputer DesignBasic Microcomputer Design
• clock synchronizes CPU operations
• control unit (CU) coordinates sequence of execution steps
• ALU performs arithmetic and bitwise processing
5. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 5
ClockClock
• synchronizes all CPU and BUS operations
• machine (clock) cycle measures time of a single
operation
• clock is used to trigger events
one cycle
1
0
6. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 6
Instruction Execution CycleInstruction Execution Cycle
• Fetch
• Decode
• Fetch operands
• Execute
• Store output
7. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 7
Multi-Stage PipelineMulti-Stage Pipeline
• Pipelining makes it possible for processor to execute
instructions in parallel
• Instruction execution divided into discrete stages
Example of a non-
pipelined processor.
Many wasted cycles.
8. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 8
Pipelined ExecutionPipelined Execution
• More efficient use of cycles, greater throughput of instructions:
For k states and n
instructions, the
number of required
cycles is:
k + (n – 1)
9. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 9
Wasted Cycles (pipelined)Wasted Cycles (pipelined)
• When one of the stages requires two or more clock cycles, clock
cycles are again wasted.
For k states and n
instructions, the
number of required
cycles is:
k + (2n – 1)
10. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 10
SuperscalarSuperscalar
A superscalar processor has multiple execution pipelines. In the
following, note that Stage S4 has left and right pipelines (u and v).
For k states and n
instructions, the
number of required
cycles is:
k + n
11. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 11
Reading from MemoryReading from Memory
• Multiple machine cycles are required when reading from memory,
because it responds much more slowly than the CPU. The steps are:
• address placed on address bus
• Read Line (RD) set low
• CPU waits one cycle for memory to respond
• Read Line (RD) goes to 1, indicating that the data is on the data
bus
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Data
Address
CLK
ADDR
RD
DATA
12. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 12
Cache MemoryCache Memory
• High-speed expensive static RAM both inside and
outside the CPU.
• Level-1 cache: inside the CPU
• Level-2 cache: outside the CPU
• Cache hit: when data to be read is already in cache
memory
• Cache miss: when data to be read is not in cache
memory.
13. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 13
How a Program RunsHow a Program Runs
14. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 14
MultitaskingMultitasking
• OS can run multiple programs at the same time.
• Multiple threads of execution within the same
program.
• Scheduler utility assigns a given amount of CPU time
to each running program.
• Rapid switching of tasks
• gives illusion that all programs are running at once
• the processor must support task switching.
15. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 15
What's NextWhat's Next
• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
16. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 16
IA-32 Processor ArchitectureIA-32 Processor Architecture
• Modes of operation
• Basic execution environment
• Floating-point unit
• Intel Microprocessor history
17. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 17
Modes of OperationModes of Operation
• Protected mode
• native mode (Windows, Linux)
• Real-address mode
• native MS-DOS
• System management mode
• power management, system security, diagnostics
• Virtual-8086 mode
• hybrid of Protected
• each program has its own 8086 computer
18. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 18
Basic Execution EnvironmentBasic Execution Environment
• Addressable memory
• General-purpose registers
• Index and base registers
• Specialized register uses
• Status flags
• Floating-point, MMX, XMM registers
19. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 19
Addressable MemoryAddressable Memory
• Protected mode
• 4 GB
• 32-bit address
• Real-address and Virtual-8086 modes
• 1 MB space
• 20-bit address
20. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 20
General-Purpose RegistersGeneral-Purpose Registers
CS
SS
DS
ES
EIP
EFLAGS
16-bit Segment Registers
EAX
EBX
ECX
EDX
32-bit General-Purpose Registers
FS
GS
EBP
ESP
ESI
EDI
Named storage locations inside the CPU, optimized for
speed.
21. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 21
Accessing Parts of RegistersAccessing Parts of Registers
• Use 8-bit name, 16-bit name, or 32-bit name
• Applies to EAX, EBX, ECX, and EDX
22. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 22
Index and Base RegistersIndex and Base Registers
• Some registers have only a 16-bit name for their
lower half:
23. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 23
Some Specialized Register UsesSome Specialized Register Uses (1 of 2)(1 of 2)
• General-Purpose
• EAX – accumulator
• ECX – loop counter
• ESP – stack pointer
• ESI, EDI – index registers
• EBP – extended frame pointer (stack)
• Segment
• CS – code segment
• DS – data segment
• SS – stack segment
• ES, FS, GS - additional segments
24. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 24
Some Specialized Register UsesSome Specialized Register Uses (2 of 2)(2 of 2)
• EIP – instruction pointer
• EFLAGS
• status and control flags
• each flag is a single binary bit
25. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 25
Status FlagsStatus Flags
• Carry
• unsigned arithmetic out of range
• Overflow
• signed arithmetic out of range
• Sign
• result is negative
• Zero
• result is zero
• Auxiliary Carry
• carry from bit 3 to bit 4
• Parity
• sum of 1 bits is an even number
26. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 26
Floating-Point, MMX, XMM RegistersFloating-Point, MMX, XMM Registers
• Eight 80-bit floating-point data registers
• ST(0), ST(1), . . . , ST(7)
• arranged in a stack
• used for all floating-point
arithmetic
• Eight 64-bit MMX registers
• Eight 128-bit XMM registers for single-
instruction multiple-data (SIMD) operations
27. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 27
Intel Microprocessor HistoryIntel Microprocessor History
• Intel 8086, 80286
• IA-32 processor family
• P6 processor family
• CISC and RISC
28. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 28
Early Intel MicroprocessorsEarly Intel Microprocessors
• Intel 8080
• 64K addressable RAM
• 8-bit registers
• CP/M operating system
• S-100 BUS architecture
• 8-inch floppy disks!
• Intel 8086/8088
• IBM-PC Used 8088
• 1 MB addressable RAM
• 16-bit registers
• 16-bit data bus (8-bit for 8088)
• separate floating-point unit (8087)
29. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 29
The IBM-ATThe IBM-AT
• Intel 80286
• 16 MB addressable RAM
• Protected memory
• several times faster than 8086
• introduced IDE bus architecture
• 80287 floating point unit
30. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 30
Intel IA-32 FamilyIntel IA-32 Family
• Intel386
• 4 GB addressable RAM, 32-bit
registers, paging (virtual memory)
• Intel486
• instruction pipelining
• Pentium
• superscalar, 32-bit address bus, 64-bit
internal data path
31. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 31
Intel P6 FamilyIntel P6 Family
• Pentium Pro
• advanced optimization techniques in microcode
• Pentium II
• MMX (multimedia) instruction set
• Pentium III
• SIMD (streaming extensions) instructions
• Pentium 4 and Xeon
• Intel NetBurst micro-architecture, tuned for
multimedia
32. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 32
CISC and RISCCISC and RISC
• CISC – complex instruction set
• large instruction set
• high-level operations
• requires microcode interpreter
• examples: Intel 80x86 family
• RISC – reduced instruction set
• simple, atomic instructions
• small instruction set
• directly executed by hardware
• examples:
• ARM (Advanced RISC Machines)
• DEC Alpha (now Compaq)
33. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 33
What's NextWhat's Next
• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
34. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 34
IA-32 Memory ManagementIA-32 Memory Management
• Real-address mode
• Calculating linear addresses
• Protected mode
• Multi-segment model
• Paging
35. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 35
Real-Address modeReal-Address mode
• 1 MB RAM maximum addressable
• Application programs can access any area
of memory
• Single tasking
• Supported by MS-DOS operating system
36. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 36
Segmented MemorySegmented Memory
Segmented memory addressing: absolute (linear) address is a
combination of a 16-bit segment value added to a 16-bit offsetlinearaddresses
one segment
37. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 37
Calculating Linear AddressesCalculating Linear Addresses
• Given a segment address, multiply it by 16 (add a
hexadecimal zero), and add it to the offset
• Example: convert 08F1:0100 to a linear address
Adjusted Segment value: 0 8 F 1 0
Add the offset: 0 1 0 0
Linear address: 0 9 0 1 0
38. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 38
Your turn . . .Your turn . . .
What linear address corresponds to the segment/offset
address 028F:0030?
028F0 + 0030 = 02920
Always use hexadecimal notation for addresses.
39. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 39
Your turn . . .Your turn . . .
What segment addresses correspond to the linear address
28F30h?
Many different segment-offset addresses can produce the
linear address 28F30h. For example:
28F0:0030, 28F3:0000, 28B0:0430, . . .
40. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 40
Protected ModeProtected Mode (1 of 2)(1 of 2)
• 4 GB addressable RAM
• (00000000 to FFFFFFFFh)
• Each program assigned a memory partition which
is protected from other programs
• Designed for multitasking
• Supported by Linux & MS-Windows
41. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 41
Protected modeProtected mode (2 of 2)(2 of 2)
• Segment descriptor tables
• Program structure
• code, data, and stack areas
• CS, DS, SS segment descriptors
• global descriptor table (GDT)
• MASM Programs use the Microsoft flat memory
model
42. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 42
Flat Segment ModelFlat Segment Model
• Single global descriptor table (GDT).
• All segments mapped to entire 32-bit address space
43. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 43
Multi-Segment ModelMulti-Segment Model
• Each program has a local descriptor table (LDT)
• holds descriptor for each segment used by the program
44. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 44
PagingPaging
• Supported directly by the CPU
• Divides each segment into 4096-byte blocks called
pages
• Sum of all programs can be larger than physical
memory
• Part of running program is in memory, part is on disk
• Virtual memory manager (VMM) – OS utility that
manages the loading and unloading of pages
• Page fault – issued by CPU when a page must be
loaded from disk
45. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 45
What's NextWhat's Next
• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
46. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 46
Components of an IA-32 MicrocomputerComponents of an IA-32 Microcomputer
• Motherboard
• Video output
• Memory
• Input-output ports
47. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 47
MotherboardMotherboard
• CPU socket
• External cache memory slots
• Main memory slots
• BIOS chips
• Sound synthesizer chip (optional)
• Video controller chip (optional)
• IDE, parallel, serial, USB, video, keyboard, joystick,
network, and mouse connectors
• PCI bus connectors (expansion cards)
48. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 48
Intel D850MD MotherboardIntel D850MD Motherboard
dynamic RAM
Pentium 4 socket
Speaker
IDE drive connectors
mouse, keyboard,
parallel, serial, and USB
connectors
AGP slot
Battery
Video
Power connector
memory controller hub
Diskette connector
PCI slots
I/O Controller
Firmware hub
Audio chip
Source: Intel® Desktop Board D850MD/D850MV Technical Product
Specification
49. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 49
Video OutputVideo Output
• Video controller
• on motherboard, or on expansion card
• AGP (accelerated graphics port technology)*
• Video memory (VRAM)
• Video CRT Display
• uses raster scanning
• horizontal retrace
• vertical retrace
• Direct digital LCD monitors
• no raster scanning required
* This link may change over time.
50. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 50
Sample Video Controller (ATI Corp.)Sample Video Controller (ATI Corp.)
• 128-bit 3D graphics
performance powered by
RAGE™ 128 PRO
• 3D graphics performance
• Intelligent TV-Tuner with
Digital VCR
• TV-ON-DEMAND™
• Interactive Program Guide
• Still image and MPEG-2 motion
video capture
• Video editing
• Hardware DVD video playback
• Video output to TV or VCR
51. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 51
MemoryMemory
• ROM
• read-only memory
• EPROM
• erasable programmable read-only memory
• Dynamic RAM (DRAM)
• inexpensive; must be refreshed constantly
• Static RAM (SRAM)
• expensive; used for cache memory; no refresh required
• Video RAM (VRAM)
• dual ported; optimized for constant video refresh
• CMOS RAM
• complimentary metal-oxide semiconductor
• system setup information
• See: Intel platform memory (Intel technology brief: link address may
change)
52. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 52
Input-Output PortsInput-Output Ports
• USB (universal serial bus)
• intelligent high-speed connection to devices
• up to 12 megabits/second
• USB hub connects multiple devices
• enumeration: computer queries devices
• supports hot connections
• Parallel
• short cable, high speed
• common for printers
• bidirectional, parallel data transfer
• Intel 8255 controller chip
53. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 53
Input-Output PortsInput-Output Ports (cont)(cont)
• Serial
• RS-232 serial port
• one bit at a time
• uses long cables and modems
• 16550 UART (universal asynchronous receiver
transmitter)
• programmable in assembly language
54. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 54
What's NextWhat's Next
• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
55. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 55
Levels of Input-OutputLevels of Input-Output
• Level 3: Call a library function (C++, Java)
• easy to do; abstracted from hardware; details hidden
• slowest performance
• Level 2: Call an operating system function
• specific to one OS; device-independent
• medium performance
• Level 1: Call a BIOS (basic input-output system) function
• may produce different results on different systems
• knowledge of hardware required
• usually good performance
• Level 0: Communicate directly with the hardware
• May not be allowed by some operating systems
56. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 56
Displaying a String of CharactersDisplaying a String of Characters
When a HLL program
displays a string of
characters, the
following steps take
place:
57. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 57
ASM Programming levelsASM Programming levels
ASM programs can perform input-output at
each of the following levels:
58. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 58
SummarySummary
• Central Processing Unit (CPU)
• Arithmetic Logic Unit (ALU)
• Instruction execution cycle
• Multitasking
• Floating Point Unit (FPU)
• Complex Instruction Set
• Real mode and Protected mode
• Motherboard components
• Memory types
• Input/Output and access levels
59. Web site ExamplesIrvine, Kip R. Assembly Language for Intel-Based Computers 5/e, 2007. 59
42 69 6E 61 72 7942 69 6E 61 72 79